This chapter covers
2.4 A dose of reality
Here’s a funny little story to put some of this scaling and fault tolerance into perspec- tive. One time I was working on a streaming system that was populating fancy dash- boards for marketers. It had all the bells and whistles—scaling, fault tolerance, monitoring, alerting—the whole nine yards. We had to have all of this and could not lose any data, because our customers wouldn’t accept a solution that didn’t have com- plete data. Once this system was running in production, I was curious as to how well our web-based dashboards that consumed our stream via WebSockets were keeping up. Well, come to find out, many of our customers were only able to keep up with about 60% of the stream that was being sent to them; the other 40% of the data was being dropped because they couldn’t read it fast enough. When I mentioned this to coworkers, they were shocked and somewhat in disbelief because our customers and business folks loved what they were seeing. It really put things in perspective: the dash- boards we produced were showing a picture of our customers’ business that was not distorted by the missing data. To me this was like the difference between the high-end
HDTV and the mid-level HDTV—sure the quality of the picture may be slightly better, but the picture doesn’t change. Now, I’m not implying that you don’t need to worry about scaling or fault tolerance, but it’s good to keep things in perspective and then reflect on the difference between “we must have xyz features” and reality.
2.5
Summary
We’ve covered a lot of ground in this chapter exploring the various aspects of collect- ing data for a streaming system, from the interaction patterns through scaling and the fault-tolerance techniques.
Along the way you
Learned about the collection tier
Developed an understanding of the various collection patterns
Had a chance to interact with a live stream
Learned how to think about scaling your collection tier
There's a big difference between sipping a glass of water and drinking directly from the hydrant. In the same way, applications built to deal with streaming data present fundamentally different challenges than those that work with stored data. For example, live location data paired with a social media profile might allow a vendor to recommend a product or service to a user at just the right instant, and the split-nanosecond reaction of a pacemaker or anti-lock brakes can save lives. Emerging techniques and technologies that enable you to take immediate action on streaming data make it possible to design and build in-the-moment decision systems, dynamic reporting dashboards, live recommendation systems, and other real-time applications.
Streaming Data introduces the concepts and requirements of streaming and real- time data systems. Through this book you will develop a foundation to understand the challenges and solutions of building in-the-moment data systems before committing to specific technologies. Using copious diagrams, this book systematically builds up the blueprint for an in-the-moment system concept by concept. Although code may occasionally appear in examples, this book focuses on the big ideas of streaming and real time data systems rather than the implementation details.
Many of the technologies discussed in the book—Spark, Storm, Kafka, Impala, RabbitMQ, etc.-are covered individually in other books. As you read, you'll get a clear picture of how these technologies work individually and together, gain insight on how to choose the correct technologies, and discover how to fuse them together to archi- tect a robust system.
What's inside
Architect a complete system for collecting and analyzing data in real time
Harness the Internet of Things by handling live data from billions of devices
Combine emerging technologies like Spark, Storm, Kafka, RabbitMQ, and Web- Sockets
Integrating and extending the Lambda architecture into a complete system No experience with streaming or real-time data systems required. Perfect for develop- ers or architects, this book is also written to be accessible to technical managers and business decision makers.
W
ith the Access layer we ensure Things are accessible on the web. However, making Things accessible via a web API doesn’t mean a client can “understand” what the Thing is, what data or services it offers, and so on. The Find layer deals with this problem. In Building the Web of Things, we propose a web-based protocol with a set of resources, data models, a payload syntax, and semantic extensions that web Things and applications should follow. This ensures that your Things and the services they provide can be easily understood and used by other web clients.However, that isn’t the end of the story. A web page offers nothing if users can’t find it, and the same goes for Things. The Find layer looks into making Things findable. One interesting technique is to make them searcheable. Just like a lonely web page starts to attract traffic once it is indexed by Google, Things can benefit from being indexed by search engines. Imagine a not-too-distant future where you can Google your running shoes to locate them instead of des- perately rooting through closets in your chaotic physical world!
In the next chapter, “Enhancing results from search engines” from Linked Data: Structured Data on the Web, you’ll learn how to make any page efficiently searchable using the Semantic Web. The same approach can be applied to the pages of Things!