Protocols

Introduction

Described here are any special protocols, formats, or extensions used by Fanout. In general, it is not necessary to understand these protocols in order to use the Fanout service. However, the information may be useful when developing frameworks or services intended to be Fanout-compatible.

Extensible Pubsub Control Protocol (EPCP)

This protocol defines a generalized and extensible way of publishing data through a publish-subscribe service using HTTP. The protocol is intended to provide a common way to publish items consisting of one or more varying formats, and to be able to control the publish-subscribe service. Items are published to channels, and relayed to any subscribers (presumably subscribed to such channels) by the pubsub service as necessary. EPCP only covers the publisher side of the pubsub pattern.

A pubsub service offering an EPCP interface must make a REST API available at a known base URI, which can be anything. For example, if the base URI of an EPCP interface is http://localhost:5561/epcp, then method endpoints would be appended onto that. For example, the publish endpoint would be available at http://localhost:5561/epcp/publish/.

Publish

The publish method allows publishing items through the pubsub service. The content provided is JSON, consisting of an object with a single field, items, containing a list of items.

Example publish:

POST /publish/ HTTP/1.1
Content-Type: application/json

{
  "items": [
    {
      "channel": "somechannel",
      "id": "a-nice-id",
      "prev-id": "an-earlier-id",
      "coolformat": { ... },
      "anotherformat": { ... },
      ...
    },
    ...
  ]
}

Items only have two mandatory fields: channel and at least one format field. The id and prev-id fields are optional, to be used when performing concurrent publish calls to the pubsub service and you want the items to maintain a certain ordering. In the above example, coolformat and anotherformat represent possible format types for the item. They are not real format types, but shown for sake of example. Formats may have any name and are to be specified in separate specifications. EPCP alone does not define any item formats.

Multiple items may be provided in a single call, and the items may target different channels. As HTTP is a request/response protocol, the ability to publish multiple items per call is meant to enable efficient batching. If there is a problem processing an individual item in a call, then the EPCP service may fail the entire publish.

Generic Realtime Intermediary Protocol (GRIP)

This protocol makes it possible for a proxy service to perform realtime-specific HTTP connection handling on behalf of a more traditional HTTP service. Since nearly all HTTP long-polling or HTTP streaming web services must implement very similar flows (process incoming request, then decide to respond immediately or hold open, or both), the rationale behind GRIP is to allow the common parts of these flows to be moved into a general-purpose edge service.

Instructions may be sent in response to requests from a GRIP proxy:

HTTP/1.1 200 OK
Content-Type: application/grip-instruct

{
  "hold": {
    "mode": "response" or "stream",
    "channels": [
      {
        "name": string,
        "prev-id": string
      },
      ...
    ]
  }
  "response": {
    "code": int,
    "status": string,
    "headers": {
      name: string,
      ...
    },
    "body": string,
    "body-bin": base64 string
  }
}

The hold field describes how the proxy should hold the client request open. The mode is either response or stream. For response holds, the proxy will be expecting a future HTTP response item to be published to it (see below). The response field defines the HTTP response data to use when timing out the request if enough time passes without any data being published. For stream holds, the response field defines the initial response to be sent to the client. The proxy will then be expecting one or more future HTTP stream items to be published to it.

A request may be bound to one or more channels. Each channel object must contain a name (the name of the channel). Optionally, prev-id may be previded to specify the previous id published to the GRIP proxy on this channel. This is used to work around the race condition of clients missing published data in between long polls.

GRIP does not mandate a specific way to publish HTTP response or HTTP stream data to the proxy. However, usage of Extensible Pubsub Control Protocol (EPCP) is recommended. In this case, the following EPCP item formats, http-response and http-stream, are defined:

"http-response": {
  "code": int,
  "status", string,
  "headers": {
    name: string,
    ...
  },
  "body": string,
  "body-bin": base64 string
}

"http-stream": {
  "content": string,
  "content-bin": base64 string
}

These can be used with the publish call to send items over HTTP connections that have been bound to channels.

Fanout Pubsub Protocol (FPP)

This protocol is a simple, HTTP-based protocol that enables clients to subscribe to channels and receive published items as JSON objects.

An FPP service is accessed via a known base URI. For example, if the base URI of an FPP service is http://example.com/pubsub and the client wishes to access the /subscribe/ endpoint, then a request would be made to http://example.com/pubsub/subscribe/.

Subscribing

To subscribe to channels, POST to the /subscribe/ endpoint with one or more desired channels. Each channel is provided using the channel info parameter ci. If multiple channels are to be subscribed to, then this means providing ci more than once. The value of each channel info parameter is another form-encoded key/value list, where name specifies the channel name.

For example, here is how to subscribe to a channel named somechannel:

POST /pubsub/subscribe/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded

ci=name%3Dsomechannel

The server will respond with a session id and cursor information for each channel:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "sid": "5b75b607-7292-4f82-be36-128f30c3123c",
  "channels": {
    "somechannel": {
      "last_cursor": "b28825fe-de7c-4d3d-937e-a3eddd365581"
    }
  }
}

The last_cursor value indicates where in the item queue for this channel that the client is positioned. This is used to ensure data is not missed. The client will need to keep track of the last known cursor for each channel it is subscribed to, and provide it in requests for data.

To subscribe to more channels, issue the same request again with the new channels added:

POST /pubsub/subscribe/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded

sid=5b75b607-7292-4f82-be36-128f30c3123c&ci=name%3Dsomechannel&ci=name%3Danotherchannel

Note that the session id has also been provided. Always provide this value for every subscription request beyond the first. If the session id is not provided, then a new session will be created.

To unsubscribe from channels, issue the same request again with those channels omitted. If you want to unsubscribe from all channels, simply do nothing at all. The server will eventually time out the session and any associated subscriptions.

Receiving data

Once a session is established, the client repeatedly polls the /feed/ endpoint for new data using GET requests. Similar channel info parameters are provided, but this time in the query string of the request, and each one contains a since value with cursor information.

For example, a request using session id 5b75b607-7292-4f82-be36-128f30c3123c, for channel somechannel whose last known cursor is b28825fe-de7c-4d3d-937e-a3eddd365581 might use a request URI that looks like this: /pubsub/feed/?sid=5b75b607-7292-4f82-be36-128f30c3123c&ci=name%3Dsomechannel%26since%3Dcursor%253Ab28825fe-de7c-4d3d-937e-a3eddd365581.

The server must either respond immediately with data or hold the request open until there is data to deliver. If a request remains open for a reasonable period of time with no data to deliver, then the server may timeout the request by responding with no data.

If the server has data immediately or eventually, then it responds:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "channels": {
    "somechannel": {
      "items": [
        {"foo": "bar" }
      ],
      "last_cursor": "db7fe00d-67dd-4467-b03d-e4b1dc5f4812"
    }
  }
}

The server may provide data for more than one channel. It may also provide more than one payload item per channel. Each item is a JSON object. Each channel provided will also contain a new last_cursor value, which the client must keep track of. It replaces any previously-known cursor for that channel and is to be used in follow-up requests.

In the event of a timeout, the server returns empty data instead:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "channels": {}
}

A client can process both types of responses the same. It really should make no difference whether a response contains data or not, as even in the case where a response does contain data the client still must handle the possibility of certain channels not having data.

After a response has been processed, the client should then poll the feed endpoint again, using the latest known cursors (and in the case of a timeout, this means cursors unchanged). To be nice to the server, it is recommended that the client wait a random time (100ms-1s) between polls. This is to reduce the bottleneck that may occur if many clients poll simultaneously, which can happen easily if many clients are listening to the same channels.

Publishing

Items published over FPP are JSON objects. It is recommended that an FPP service allow publishing via Extensible Pubsub Control Protocol (EPCP). In that case, an item format called fpp is defined, whose value is the JSON object to be published.

An example of a publish request:

POST /publish/ HTTP/1.1
Content-Type: application/json

{
  "items": [
    {
      "channel": "somechannel",
      "fpp": "some data"
    }
  ]
}