Receive events using Webhooks

Introduction

In this page we describe how customers and partners can receive structured information about events happening to their data in Scalepoint systems.

Objectives

  • Reliable messaging
  • Freshness of data
  • Efficiency of communication
  • Resource utilization

Solution

Scalepoint supports both pull using long-polling and push using webhooks receive modes. This article will focus on Webhooks.

Webhooks are a form of user-defined custom HTTP callbacks. With the Webhook model, an integrating party registers an endpoint to which the event producer can then post events. When an event is posted to the endpoint, the client application that is interested in such events can take appropriate actions.

A Webhook is a way to deliver real-time data to applications. You can think about Webhooks like push notifications on your mobile phone. Rather than burning up the battery on your phone fetching information (polling) from applications to get updates, push notifications (Webhooks) automatically send data based on event triggers. And just like push notifications, Webhooks are less resource-intensive. They are more efficient than long-polling from resource utilization perspective, however can be slightly more difficult to deploy securely on the consumer side, requiring TLS v1.2 or later and some form of authentication support.

Endpoint registration

Scalepoint has developed a Subscription API for managing webhook subscriptions. Once the clients are successfully subscribed to Scalepoint's Webhooks, we will start publishing the events to the clients subscription endpoint. Scalepoint does not yet provide any Admin UI to register endpoints and the subscription API is not enabled for third party clients. We plan to allow self service subscription management at a later point, but at the moment subscriptions are configured by Scalepoint.

Endpoint registration can be shared in any format, using secure communication channel for secrets exchange if secrets needs to be exchanged for a specific authentication scheme.

To subscribe to the available events integrators need to provide us the below information

  • URL(s) to publish - Subscription endpoint
  • Event filter(s) - Selection of events to receive at the endpoint
  • Authentication settings - OAuth client_credentials flow parameters or self-issued JWT token configuration
  • Email address(es) to report problems (repeating errors) to

Below diagram provides an high level overview of Event API with Webhooks implementation.

Event API with Webhooks mechanism

Endpoint implementation

An endpoint must accept POST HTTP request with UTF-8 encoded application/json formatted request body. HTTPS with TLS v1.2 or later is required. We do not support cleartext HTTP.

Endpoint must return 202 Accepted upon successful acceptance. If endpoint decides to simply drop message without processing, it must return 202 Accepted, so that message is not re-delivered by the subscription.

Any other response status codes are considered as errors, whether they are generated by the endpoint itself or some reverse proxy in front of the endpoint.

Webhook subscriptions will currently attempt to forward messages in the same order they have been accepted by Events API, so subscriber must accept one message before getting the next one. This behavior is subject to change, since it severely limits the throughput of a subscription in presence of failures, while still not providing end-to-end sequential guarantees in a distributed setting, as API consumers may intuitively expect. The reason for that is that events may be accepted by Events API in the wrong order from upstream services. Until the change is made, consumers need to forward messages to internal queue and implement retries there if they want to be able to recover from message reordering.

Authentication

Currently Bearer token authentication is supported, but the token can be either a client_credentials flow OAuth access_token issued by any token endpoint or a pre-shared key JWT token. Token is passed in Authorization header like this:

Authorization: Bearer <token-value>

Another option - is SSL certificate authentication.

We might add support for Basic authentication scheme if required.

No authentication is not supported, even if accessed over site-to-site VPN tunnels and/or IP screened. Scalepoint cannot verify the correctness of authentication logic implementation by the remote endpoint, but we'd like to stress the importance of this aspect.

Error handling

Webhook subscriptions implement several features in order to guarantee reliable delivery and ensure autonomous operation of subscriptions: retries with backoff, message poisoning and status alerting.

Here's how we interpret endpoint response status code:

  • Success (202 Accepted)
  • Transient error (500 Internal Server Error)
  • Permanent error (anything except 202 Accepted or 500 Internal Server Error)

For both kinds of errors, delievery is retried, but the handling logic is slightly different as described below.

Retry behavior

Permanent errors are considered to be caused by problems with the specific payload (but of course they can be actual bugs on the receiver side, too) and they are handled differently depending on if message poisoning is enabled or not. If it is enabled then they are moved into a poison message queue after a few retries. At this point we trade message order guarantee for availability.

If message poisoning is disabled then they will be retried infinitely until successful response.

Transient error is an error that is supposed to be fixed on a client side, and it does not make sense to send next event until it is fixed. Thus, it will be retried infinitely until successful response is received (independently on if message poisoning is enabled or not).

Get poisoned messages

Scalepoint provides the API to access messages in the poisoned message queue. The poisoned events can be accessed through GET rest api call to:

https://www.scalepoint.com/api/events/v1/dk/external/{tenant}/poison-messages?offset={offset}&limit={limit}.

Both offset and limit parameters are optional and provide the pagination for available poisoned events. The default limit value is 100 and can't be higher.

For authentication the service requires a token with scope events obtained from the token endpoint described on the Authentication page.

Backoff algorithm

In order to avoid storming the endpoint with requests during failures, we have implemented a retry backoff policy. Delay increases when endpoint reportedly returns with non-successful status and resets on successful push.

Below you could see our default back off policy timings.

Retry count Delay
1 5s
2 10s
3 30s
4 1m
5+ 10

We consider making this policy configurable per subscription at some point.

Alerts

In case of repeated errors, webhook subscription will issue an email alert notification to the configured email address(es).

We consider adding SMS, secondary webhook and other alert notification channels later.