• No results found

Reading fast enough

In document Streaming Data.pdf (Page 158-161)

Consumer device capabilities and limitations

8.1 The core concepts

8.1.1 Reading fast enough

Speed of reading may not be the first thing you think of when thinking about building a streaming client, but it’s often a major consideration. There are two important sides to ensuring a client is reading fast enough: the streaming API side and the streaming client side. This chapter is focused on the streaming client, but let’s look at the streaming API side of the problem for a moment.

Why is reading fast enough important to the API? It boils down to data loss (first and perhaps foremost) and server resource utilization. This will depend on the tech-nology used to deliver the stream from the API. In our case, let’s zoom in on two pop-ular techniques discussed in chapter 7: server-sent events and WebSockets, which are similar from both a server and client side. Figure 8.3 shows the client-not-reading-fast-enough situation we can run into with both approaches.

In figure 8.3, the client is able to process the first two messages before another message is ready to be sent. But processing message 3 took too long, resulting in the streaming API having to decide what to do with messages 4–6. Should it hold them in memory or blindly send them? Holding in memory is certainly an option in this

example—we’re only talking about three messages. Blindly sending may also work, although we would have to deeply understand the underlying technology the stream-ing API is usstream-ing. What if that technology drops the messages if the network buffers are full? In this case, there is the potential for data loss. Let’s take the approach that the streaming API will hold the messages in memory.

This may seem like an easy problem when we’re talking about a couple of mes-sages. What about when the velocity of the stream is 1000 messages/sec, and the client is falling behind? The API developer is left in a position of potentially having to dis-card data to ensure that server resources aren’t exhausted or applying backpressure to upstream systems. To aid in this situation and help to ensure data isn’t lost and the server resources aren’t exhausted, an API developer can notify a client that it’s falling behind. Unfortunately, not all third-party APIs offer this feature, but if you’re develop-ing the streamdevelop-ing API, I highly recommend that you do provide this to your clients.

Turning our attention back to the client side of this problem, three important questions pop into my mind:

How do I know if I’m reading fast enough?

What happens if I’m not?

How do I scale my client so I can keep up with the pace of the stream?

Streaming client

Time

Event 1

Connect

Streaming data API

Streaming analysis

store Process

message 1

Network card

Read

Process message 2 Read

Process message 3 Read

Event 2

Event 3

Event 4

Event 5

Event 6

What do we do with these messages?

Figure 8.3 Generalized server-sent events and WebSockets data flow showing slow client

The way you address these questions will vary slightly depending on whether you’re consuming a third-party API or an internally developed one.

THIRD-PARTYSTREAMING API

Depending on the streaming API you’re connecting to, it may provide guidance on how to be notified if you fall behind and what the ramifications are for doing so. For exam-ple, as of this writing, the Twitter API will disconnect any client that falls too far behind.

Twitter doesn’t provide an explanation for what “too far behind” means, but it does send stall warnings that your client is falling behind. The Twitter API sends a message approximately every five minutes if a client is falling behind. Again, there’s no guidance on how many stall warnings will be sent before your client is disconnected.

What can you do if the API you’re consuming data from doesn’t offer a warning message or other mechanism to let you know you’re falling behind? One strategy is to read the timestamps on the messages you’re receiving and compare them to the cur-rent time. Then as you process messages, if the gap between the curcur-rent time and the message time starts to drift you can reason that your client software may be falling behind. Remember that the service you’re consuming from provides a timestamp on each of the messages being sent. If the stream you’re consuming doesn’t provide a timestamp on each message for when it was generated, your next best bet would be to try to ascertain what the expected flow rate is for the stream. Some streams may be episodic in nature; in these cases you may be able to ascertain the pattern and from there reason about how well your client is keeping up with the stream.

This brings us to the next question: What happens if we are not keeping up? This is an interesting question, and unfortunately the answer depends on the streaming API you’re consuming data from. As of this writing the Twitter API clearly indicates that it will close the connection if your consumer can’t keep up with the stream. Other streaming APIs may not drop the connection; in fact, some will instead drop data that can’t be consumed fast enough. If you’re building a consumer that consumes data from a third-party streaming API, make sure you ask about this scenario.

YOURSTREAMING API

If you’re building a solution whereby you control the entire stack, then you want to ensure your streaming API clearly states what happens to the data and/or the con-nection if the consumer can’t keep up. You can do that via documentation, includ-ing status messages in the stream and logginclud-ing the sendinclud-ing of status messages. The status messages should provide information on how far behind the consumer is and a warning about the connection being closed if the consumer doesn’t keep up. We all know that sometimes documentation doesn’t get read, and having a data-driven solution will allow your customers to react to changes and allow your streaming API to provide a better experience. Be aware that this will put a further burden on a cli-ent that’s already having trouble keeping up, by asking it to process another mes-sage type and act on it. You should also log the sending of these mesmes-sages so that they can be analyzed to help in troubleshooting slow consumers and/or problems in your streaming API.

In document Streaming Data.pdf (Page 158-161)