Large Scale Server Publishing
for Dynamic Content
Karl Andersson Thunberg
June 7, 2013
Master’s Thesis in Computing Science, 30 credits
Supervisor at CS-UmU: Jan-Erik Mostr¨
om
Supervisor at Dohi Sweden: Linus M¨
ahler Lundgren
Examiner: Fredrik Georgsson
Ume˚
aUniversity
Department of Computing Science
SE-901 87 UME˚
A
Abstract
The number of interactive and dynamic web services on the Internet is growing more and more and to accommodate as much functionality as possible, many techniques for asyn-chronous web communication are being developed. This thesis report describes the eval-uation of an existing web service that uses bidirectional communication over the web to provide voting functionality in real-time on web pages. The thesis consists of an assess-ment of problem domains, an evaluation of the system and an impleassess-mentation of some of the identified problems. It focuses on a few core issues of the current solution, namely the communication techniques between the client and the server, the setup of the overarching structure of the system and the separation of messaging channels for different use cases.
The evaluation of the reference system was motivated by addressing the issue of being able to packet the service better as a product and create a distinction between the use case and the underlying system. It was done so that the stakeholders of the product may more easily define the way the service can be used and so that a better course of action can be taken for continuing the development of the service. The implemented solution shows an example of how the messaging channels could be separated and what kind of trade-offs exist between the current and implemented solution.
Contents
1 Introduction 1 1.1 Background . . . 1 1.2 Goals . . . 2 1.3 Purpose . . . 3 1.4 Related Work . . . 3 1.5 Thesis Overview . . . 4 2 Method 7 2.1 Outline . . . 7 2.2 Development Method . . . 8 2.2.1 Planning Sessions . . . 82.2.2 Ongoing Sprint Activities . . . 9
2.2.3 Sprint Transitions . . . 9
2.3 Evaluation of Existing Software . . . 11
2.4 Results Analysis . . . 11
2.5 In-depth Study . . . 12
2.6 Documenting and Tracking Results . . . 12
2.7 System Orientation . . . 12
3 Assessment of Problem Domains 15 3.1 Web Communication . . . 15
3.1.1 Communication Techniques . . . 16
3.1.2 Web Servers . . . 18
3.1.3 Web Application Libraries . . . 21
3.2 Internal Messaging . . . 22
3.2.1 Messaging Paradigms . . . 22
3.2.2 Qualities of Service (QoS) . . . 24
3.2.3 Messaging Systems . . . 24
3.3 Data Processing . . . 26
3.3.1 Batch Data Processing . . . 26
3.3.2 Stream Data Processing . . . 27
iv CONTENTS
3.4 Distributed Coordination with Zookeeper . . . 28
4 Evaluation of Reference System 31 4.1 Use Case . . . 31
4.2 System Overview . . . 32
4.3 Discussion . . . 35
4.4 Assessment of Issues . . . 37
4.4.1 Inseparable Use Case . . . 37
4.4.2 Functionality Not Defined . . . 37
4.4.3 Required Maintenance . . . 37
4.4.4 Single Messaging Channel . . . 37
4.4.5 Complete Replica of Global State . . . 38
4.4.6 Intra-dependencies . . . 38 4.4.7 Simple Messaging . . . 38 4.4.8 State Synchronisation . . . 38 4.5 System Definition . . . 38 4.6 Proposal of Improvements . . . 39 4.6.1 System Architecture . . . 39 4.6.2 Messaging Channels . . . 43 5 Publish-Subscribe Pattern 45 5.1 Introduction . . . 45 5.1.1 Overview . . . 45 5.1.2 Subscription Models . . . 46 5.2 Problems . . . 47 5.3 Measurements of Results . . . 48 5.4 Classification of Methods . . . 48 5.5 Research Summary . . . 49 5.5.1 Decentralizing Brokers . . . 49 5.5.2 Peer-to-peer Methods . . . 54
5.5.3 Self-managed Self-optimized Methods . . . 59
5.6 Conclusion and discussion . . . 63
5.6.1 Decentralising Brokers . . . 63
5.6.2 Peer-to-peer Methods . . . 63
5.6.3 Self-managed Self-optimised Methods . . . 64
5.6.4 The Future . . . 64 6 Implementation of Improvements 65 6.1 Goals . . . 65 6.2 Measurements . . . 66 6.3 Implementation Details . . . 67 6.3.1 Channel Definition . . . 67
CONTENTS v
6.3.2 Components . . . 67
6.4 Results . . . 71
6.5 Results Analysis . . . 72
6.5.1 Separation of Messaging Data . . . 72
6.5.2 Separation of Messages . . . 74 6.5.3 Initialization . . . 74 7 Discussion 75 7.1 Restrictions . . . 75 7.2 Future Work . . . 76 8 Acknowledgement 77
List of Figures
1.1 A high level representation of the system, outlining three important aspects of the system; the asynchronous web communication between client and server (1), the back-end architecture (2) and the separation of the messaging
chan-nels (3). . . 4
2.1 A work-flow overview of a typical sprint planning session. . . 9
2.2 The work-flow during a pre-planned workday. . . 10
2.3 Work-flow at the end of a sprint. . . 10
2.4 An overview of the system and its underlying components. The numbers represents three technical aspects that were deemed important for this thesis. They are: 1. the asynchronous communication between a web client and a web server, 2. the back end architecture and 3. the separation of the messaging channels for events. . . 13
4.1 A representation of how a client currently is exposed to the service. It is exposed to the service on a web page where the static content, such as HTML and CSS, is downloaded from a static file store. This static content also contains logic for communicating with the web server that contains the voting functionality. . . 32
4.2 Overarching components in the reference solution. A web client fetches the static content from a static file store, which also contains the logic to com-municate with the web server that contains the voting functionality. A web client connects to a web server using a load balancer. Any votes are put on a processing queue so that votes are bufferd to a data processing component that calculates a result and returns it using a notification system. The service also uses a database to store the global state of all events. . . 33
4.3 An overview of the service bus architecture describing the way components interact with each other. This solution proposed the use of a message bus which all components uses to communicate with each other. . . 41
viii LIST OF FIGURES
4.4 An overview of the two-tier architecture presenting the layout of the com-ponents of the system. This solution divides the service in two parts that communicate with a message bus in between them. One one side is the client functionality and on the other side is the use case functionality. . . 42 4.5 Outline of the layered web server solution. It introduces two levels in the web
server where the higher level contains use case specific logic separate from each other, while the lower level manages these use cases. . . 44 5.1 A description of how types in a type-based publish-subscribe solution may
relate to each other. A subscriber may subscribe to the type sports news, in which it will receive all sports news, both those that are golf news and those that are football news. A subscriber may also subscribe to, for example, golf news, in which it will only receive golf news and not fotball news. . . 47 6.1 An overview of the components in the solution which highlights the general
functionality of the messaging channels that is not tied to any use case. It separates the web server in two levels: a higher level which contains use case specific functionality in separate modules and a lower level that manages these modules. It introduces a web client interface that the clients use to express their interest in events or channels. The client validation cache is also used directly by the lower level of the web servers. The load balander and the notification system is also viewed as part of the basic functionality. . 68 6.2 Overarching components in the solution with modified components
high-lighted. Those that were modified during the implementation are the web client, web server, notification system and database. Other components were used as they were. . . 69 6.3 Number of messages received by three clients where the first client is
publish-ing and subscribpublish-ing to channel 1, the second client is subscribpublish-ing to channel 1 and the third client is subscribing to channel 2. . . 71 6.4 The perceived latency for a client when initializing a connection to the service
with one subscription. . . 72 6.5 The perceived latency of ten consecutive client initializations. . . 73 6.6 The size of the bootstrap when ten channels are added to the service, each
with a bootstrap of 1000B data. The bootstrap is the initial message that the web server sends to a client when the client first connects. . . 73
Chapter 1
Introduction
Dynamic web applications are applications that attempt to enhance the web experience for users by providing interactive or interchanging content such as video or live chatting. Dynamic content1 could be incorporated client-side which includes the ways a web page
behaves in response to user input. It can also be present on the server-side which refers to the way a server responds to a client based on its internal state. Combining these two methods is a third alternative that can provide further functionality by using additional communication between a web client and a web server.
Finding ways to create an asynchronous connection between a web client and a web server has always been studied because of the additional content such functionality can provide. Earlier solutions include making use of the HTTP request/response protocol to simulate a bidirectional connection using polling techniques. Because of HTML5 and the continuously improving performance of web browsers, dynamic content is being used more and more in web application and this has brought forward newer and more efficient ways to communicate over the web. Making use of browser plugins and controllers have also been used to establish further communication functionality but these kinds of solutions are slowly being phased out as more regular web communication techniques are standardised.
Today, virtually any larger website provides some dynamic content using asynchronous communication. A couple of examples are Facebook’s chat functionality, automatic updates on Twitter feeds and streaming video on Youtube. Games are a large part of the web and as HTML is getting more versatile and web communication is getting faster and more standardized, it opens more use cases. As web communication develops, existing web appli-cations and their software architecture may need to be revised to support new methods and technologies which can be a complex task because of the large scale infrastructure needed of a web application and the higher demand on web application performance.
1.1
Background
Dohi Sweden is a holding company that offers, inter alia, products and services for the interactive web. The company has developed a web service that makes use of asynchronous communication to provide a solution that can send and receive information between a web client and a web server without the need of a page reload. This service is used for receiving a lot of asynchronous data from the web clients, processing it on the back end and then
1Dynamic content refers to content provided by dynamic web applications.
2 Chapter 1. Introduction
pushing the results back to the web clients. It was made for another company that had a specific use case for it and it includes a web client interface, a web administration interface and a back end that serves these two interfaces. The use of this service is tied to specific events or time intervals where its functionality is provided for these events separately and the data transmitted during an event is completely decoupled from any other event.
The current service that Dohi Sweden provides was developed completely for this use case and may be limited to use for other companies or similar use cases. The web clients for a specific event cannot be distinguished from web clients for other events and the data that is received and processed during an event are broadcasted to every connected web client no matter if a web client is interested in the event or not. The service could be used in many other ways if it featured a more general functionality, with the specific use case separated from the underlying system where also the web clients could be separated based of the event they are interested in. Making a more versatile solution is of interest for the stakeholders of Dohi Sweden as it enables packeting this solution so that it can be offered to other companies and applied to other use cases. Supporting multiple use cases that are clearly separated from each other would make a more appealing product as it would let clients focus on the work for the specific use of the product instead of doing redundant work that may be the same between use cases and events.
1.2
Goals
The overall goal of this thesis project is to evaluate the stakeholders’ current solution and the underlying technologies in order to improve the service with a solution that may better suit their needs. This evaluation addresses the issue of separating the use case with the underlying system to make a solution that behaves more like a framework that can be used to implement other types of web services and use cases. Web services that are within the scope of the desired solution are those that makes use of asynchronous communication between the front end and the back end and those that feature a scalable back end where received data may be processed before a result is sent back. It is assumed that transmitted data between web clients and the back end are linked to an event and are completely separated from data that are transmitted for other events.
To be able to create a base on which decisions for the evaluation can be made, it is necessary to include an assessment of the problem domains that are relevant for this project. The assessment does not include every part of the overall solution but instead focuses on three problem domains; client-server communication, internal end messaging and back-end data processing. A fourth part, that is at least equally important but is not included in this work, is the data model of the solution. What kind of data model that should be used relies heavily on what kind of use case it should be designed for but the work needed to create a data model that suits all targeted use cases is outside the scope of this thesis. Other parts of the solution are not explicitly evaluated but may differ from the current solution, as replacing one may make it easier to use any of the suggested improvements.
The work also includes an implementation to test and validate the results of the evalu-ation which means that the project can be broken down into the following three goals:
– Make an assessment of already existing techniques and solutions relevant to the project domains and describe in what way they are relevant, what they do and how they are used. Solutions for a specific task must be compared and their differences documented. – Evaluate the current solution and identify problems related to separating the use case and the system. Make a proposal of improvements for the identified problems
1.3. Purpose 3
and describe in what way they differ from the current solution. This also includes documentation of how the the differences from the improvements impacts the current solution in terms of scalability, latency, manageability and extendability.
– Test and evaluate a set of the proposed improvements by applying them to the current implementation, including tests to validate the behaviour of the original use case and to document the differences in terms of scalability, latency, manageability and extendability.
An abstract view of the service is presented in Figure 1.1 which does not consider its individual components but instead focuses on the higher level functionality and three aspects that are of importance for this kind of system. The overview is purposely presented at this high level, postponing the detailed description of the components until later chapters.
The three important aspects of the service are the client-server communication, the back end application architecture and the separation of the data transmitted on different events. The client-server communication (1) refers to the service’s ability to uphold an asynchronous connection that provides a reliable and fast way of transmitting data. It is important that the service can be provided to as many clients as possible and do not feature any heavy processing on the client-side of the solution so that the service does not degrade performance of any other web service it is integrated into. The back-end application architecture (2) is of interest as the way the internal components are set up and interact with each other, determines what the solution can be used for. The last aspect (3) addresses the importance of being able to distinguish between the messaging channels used for the different events as that makes it much more clear of how the use case can be separated from the system.
1.3
Purpose
These goals, when met, will result in a documented evaluation of the current service which contains a list of proposed improvements where a subset of these improvements have been tested and demonstrated. This documentation outlines ways to proceed when continuing with the development of the service and how this development could be done depending on the desired path. This report will serve as a documentation for current techniques and solutions that together with the evaluation and the implementation provide material for gaining more knowledge in this field as well as provide an example of how this type of service can be made.
The implemented improvements consists of a back end that contains a clearer distinction from the current use case so that it is more apparent how the system works and what it could be used for. The solution is able to support more than the current single use case and is able to send data to only the correct clients. This creates a good base that can be used when incorporating this functionality in the current system.
1.4
Related Work
This project is an evaluation of a system that is owned and developed by Dohi Sweden and it is currently still being developed to include more use cases for the initial client. This may help during the development of this thesis as it may give further directions of where this project is heading and what areas is of interest for this thesis to study. The result of this thesis is meant to provide additional information when proceeding with developing the current service so that it may be easier to implement the addressed improvements. The
4 Chapter 1. Introduction
Figure 1.1: A high level representation of the system, outlining three important aspects of the system; the asynchronous web communication between client and server (1), the back-end architecture (2) and the separation of the messaging channels (3).
current solution is evaluated from the state it was in when this project was started and any further functionality will not be included in this evaluation.
1.5
Thesis Overview
Chapter 2 presents the methodology used during this project to reach the defined goals of the thesis, including how the steps of the project was planned and documented. This
1.5. Thesis Overview 5
chapter also describes the tools used during the development and how results were measured and analysed. Chapter 2 concludes with a brief overview of the entire system together with an explanation of how to read subsequent chapters. By presenting the overview as early as possible, it allows readers to skip uninteresting parts, while still understanding how each component is related in the actual solution.
Chapter 3 describes an assessment of the problem domains that are relevant for this thesis work. This chapter is meant to document the base on which future decisions for the implementation was made on so that the reader may better follow the reasoning behind the path that was taken. It represent the analysis of existing techniques and solutions that was made prior to the evaluation of the current system. Readers that are already confident in the problem domains that this chapter addresses may skip this chapter completely and continue reading without any problems of understanding the other chapters.
Having made an assessment of the problem domains for this thesis, Chapter 4 focuses on the task of evaluating the current solution and identifying problems with regards to the project goals. It presents the use case for the system and gives an overview of how the system is built and how its different components interact with each other. It includes a discussion about the different components focusing on the problem domains and concludes with a list of proposed improvements for the identified problems.
Chapter 5 is an in-depth study of scientific research in the area of the publish-subscribe pattern. It focuses on ways to build the internal messaging system so that one of the immediate goals of the thesis can be fulfilled, namely the separation of the use case and the underlying system. This chapter starts with an overview of the pattern that explains how it works and what it is used for, including different alternations of it. It contains a summary of three domains that are of interest for this project and outlines their impact and how they can be of use in this system.
Chapter 6 presents an implementation that aims to solve some of the addressed problems using some of the proposed solutions from chapter 4. It describes the plan that was taken during the implementation as well as the design of the system with a list of what needs to be changed from the reference system. It also contains motivation as to why the proposed improvements was implemented, how they will be tested and what the results from the tests may conclude.
Chapter 7 addresses the implementation that is made to test and evaluate a set of proposed improvements. It combines the result provided by the previous chapters and presents a suggested implementation, viewing the solution in its entirety. It proceeds to evaluate the results of the tests for the implementation in accordance with the evaluation criteria from Chapter 2. It concludes with some quantitative results derived from the test runs on the complete system based on the goals of this thesis.
Chapter 8 contains an overall discussion and conclusions of both the implementation results and how ideas derived from the in-depth study in Chapter 5 can be realized with the developed framework as a base. It highlights the restrictions and limitations present in the implementation and what implications the developed system may have in the future.
Chapter 2
Method
The general methodology, used when performing this thesis project, is described in this chapter. Section 2.1 lists the steps performed to divide, implement and analyze each indi-vidual component. The agile development method called Scrum, was used for planning and documentation during this work and this method is presented in Section 2.2. This chapter includes a description how existing software is to be evaluated in Section 2.3. Section 2.4 explains how the results related to the system evaluation are to be analyzed and how the tested improvements should be applied to create as fair of a comparison as possible. The approach used in the in-depth study is presented in Section 2.5 while Section 2.6 presents how the general project progress were documented. The chapter concludes with a system overview in Section 2.7 that describes the components of the system and which parts are focused on during the thesis.
2.1
Outline
This section describes a general outline of the steps taken, both for the assessment of do-mains, the theoretical evaluation, the in-depth study and the implementation. It explains how the work was divided and performed and each step can be related to the thesis in its entirety as well as to each individual chapter.
1. Create a list of system requirements from the stakeholders, or in the case of current software analysis, assemble a list of supported features and potential limitations. 2. Split up the project into logical parts. These parts will be addressed individually, with
a background focus on how these components make up the entire solution.
3. Do a pre-study related to the addressed subject and areas related to the different parts of the system.
4. Set up quantitative measurements for performance and validation for the thesis’s in-dividual components and the solution as a whole.
5. Produce data for the quantitative measurements by either implementing tests or gather information from other sources.
6. Analyze and document the results and relate these to the list of quantitative measure-ments.
8 Chapter 2. Method
2.2
Development Method
Dohi Sweden uses an agile development method called Scrum[40]. Scrum focuses on the iteration of Sprints which is a limited time-frame where the development of a product is incremented to a potentially releasable product. Each Sprint has a clear and decisive goal that is supposed to be reached in a designated time-frame which usually spans one to a few weeks. Sprints have generally a consistent duration throughout the development and are immediately followed by another Sprint when the current is done. Short iterative steps are used to quickly build prototypes which can be tested to make quick decisions on how to proceed. Companies and project groups usually use their own modified version of the Scrum development method. The version and work-flow used at Dohi Sweden is a compromise of previous experiences at the company and the limited time span of the thesis project. The daily and weekly Scrum activities that were held with the external supervisor at Dohi Sweden are the following:
– Every day began with a Scrum meeting, including the discussion of: • What was done yesterday
• Problems that had been encountered • What had to be done until tomorrow
– Each week concluded with a summary of the following three aspects: • Completed weekly tasks
• Problems that had been encountered • Tasks that had to be postponed or modified
– After every month (roughly one iteration) the progress was presented to the external supervisor and summarized to provide a detailed project progress report.
From this general activity work-flow there are three central Scrum activities that need to be specified further; the planning sessions, the activities during an ongoing sprint and finally the transition between two subsequent sprints.
2.2.1
Planning Sessions
Planning sessions is an important part where all requirements are formulated into tasks, subsequently divided to fit a specific time granularity and then grouped into components. Although the development method belongs in the agile class, the planning process can be considered as a generally iterative process. As each requirement is first formulated into high-order tasks, these are iteratively divided into lower-level tasks. As long as the expected granularity of these tasks exceed a specified time threshold, they should be divided further. By keeping the tasks at the lowest possible level without specifying implementation details, it ensures that they are conceptually easy, allowing a more steady flow of progress to be reported. By grouping the divided tasks into components they are also easier to order by either expected difficulty or by the order of dependency on other tasks. Figure 2.1 on the facing page gives a work-flow overview of how the weekly sprint planning sessions of the thesis project were performed.
2.2. Development Method 9
Figure 2.1: A work-flow overview of a typical sprint planning session.
2.2.2
Ongoing Sprint Activities
These planning sessions produced (at least) a week’s workload divided into 4-hour tasks. The granularity of these tasks varied slightly, but as a consistency measure and to aid in project progress overview they were all averaging roughly four hours each. The typical workday was 8 hours, with five days a week; yielding an expected task completion rate of ten tasks per week. A work-flow overview of daily activities of a pre-planned sprint are shown in Figure 2.2.
It was not explicitly stated in the planning sessions, but it was considered that the time assigned to tasks would also include testing and documentation. The completion of any task (on time or not) results in progress that is either completed and tested or marked for further testing. Any task may also be postponed for whatever reason.
2.2.3
Sprint Transitions
A sprint is considered completed, whether the tasks have been completed or not, when a new week begins. Tasks that are not completed in one sprint will carry over to the next one. As the new sprint begins it starts with a summary of the progress of the completed sprint, where tasks that were not completed may now either be discarded, re-prioritised or postponed even further in the new planning session that followed directly. Figure 2.3 shows how a recently completed sprint is summarised and its results combined into a new planning session.
The concept of accepting a weekly progress is related to a measurement of how well the actual progress matched the planned activities. If there were any unplanned interrupts or bugs introduced that consumed a lot of time, this may have led to some tasks not being
10 Chapter 2. Method
Figure 2.2: The work-flow during a pre-planned workday.
Figure 2.3: Work-flow at the end of a sprint.
completed as planned. In these cases, the progress may be considered unacceptable, and some tasks may have to carry over to the next sprint. Some tasks may have to be discarded completely or postponed indefinitely.
2.3. Evaluation of Existing Software 11
The decision to discard or postpone tasks may in projects involving more than one person may instead be reduced to re-distribution of the workload, allowing other team members to accumulate more tasks to ease the burden of particularly bug-ridden or difficult tasks assigned to specific persons. Of course, the purpose of having daily Scrum meetings should limit the necessity of this, as the progress of each individual and the team as a whole is monitored regularly. This makes it easier to make adaptive changes to regulate the progress with the ultimate ambition to avoid having to resort to discarding or postponing tasks.
Short sprints instead of long iterations does not not only offer the possibility to make short-term workload readjustment for particular team members, it also provides natural milestones for both iterations and the project road-map as a whole.
2.3
Evaluation of Existing Software
There are several solutions for asynchronous web communication already available on the market. Companies are providing both complete solutions and software frameworks to support web applications with dynamic content. However, many frameworks are custom for a particular use case or application area. This introduces limitations that apply for the size of the application scope, the general cost of production and the choice of different software architecture components. By evaluating the current software, it is not only possible to compare the performance to a proprietary solution but also possible to gain insight on what problems exists and how they can be solved, or worked around. This evaluation will promote feature extension instead of re-iteration of already implemented and well-functioning solutions.
The motivation for developing a proprietary framework is to promote the possibility for easy continuation of the already existing use case while still being able to extend the system to other applicable areas. Three main evaluation criteria were used in the evaluation of current software solutions:
1. The ability to support as many web clients as possible over asynchronous connections, no matter which platform or browser the client uses.
2. The computational complexity and calculation time of vital software procedures, such as separating the different messaging channels for events.
3. The software architecture, in the context of being able to support different use cases and flows for the application tasks.
2.4
Results Analysis
Since the different goals of the project have been split into three separate chapters, the results for each goal will be addressed in respective chapter while an analysis and discussion of the overall system as well as the work for this project are presented in chapter 7.
To be able to perform fair comparisons with current software, the implementation will imitate the original behaviour as closely as possible and any differences between the two systems will be thoroughly analysed and documented in what way they differ and what impact that may have on the test results and a future integration of the implemented functionality into the current service.
12 Chapter 2. Method
2.5
In-depth Study
The in-depth part of the thesis project (see Chapter 5) makes up the theoretical foundation that concludes with the thesis project implementation. Focus was on three aspects of the publish-subscribe pattern and the in-depth study tries to give an overview of how these work and what they may be used for in the current system. The placement of the in-depth study chapter was chosen explicitly to be preceded by the chapter on implementing proposed improvements, as the techniques and methods described in the in-depth review are closely related to those explained in detail in the development of the improvements.
2.6
Documenting and Tracking Results
Throughout the whole project, a project diary was used and shared online with the internal supervisor so that he could get a better understanding of the current status of the project. The work for the project diary consisted of noting down the following aspects for each week:
– What I have done.
– What kind of experiences I have gained.
– A status update of the schedule, including a revision when the schedule have not been followed.
– Who I have been communicating with and in what way.
During the thesis project at Dohi Sweden, a project management tool called Pivotal Tracker1was used to document the tasks for each sprint and note down what has been done
each day and how much time that has taken. It was also used to keep track of the milestones for the project and to generate various backlog information such as burn-down charts and estimations of when milestones and tasks will be done using a calculated project velocity.
2.7
System Orientation
Figure 2.4 shows an overview of the current system, including the components of the back end and how they communicate with each other. Three important aspects for this evaluation is the communication between the front end and the back end, the architecture of the back end and the separation of messaging channels for different events. The client-server communication will focus on the link between the front end and the web server. The setup of the overarching system and its components will address most of the components presented in the figure, excluding the static file store and focusing on the way components relate and interact with each other. The separation of events addresses the problem of logically distinguishing between different event data inside the system components as well as on the actual channels of the internal messaging bus.
2.7. System Orientation 13
Figure 2.4: An overview of the system and its underlying components. The numbers rep-resents three technical aspects that were deemed important for this thesis. They are: 1. the asynchronous communication between a web client and a web server, 2. the back end architecture and 3. the separation of the messaging channels for events.
Chapter 3
Assessment of Problem
Domains
This chapter describes an assessment of problems and questions that are relevant for this thesis work. These problems have been grouped into four domains which are represented with four sections in this chapter. The first three domains reflect those that were defined by the goals of this work while the last domain centres around a specific problem that is prevalent in the other domains.
Section 3.1 focuses on the problems of creating large scale asynchronous communica-tion over the web. This seccommunica-tion addresses underlying communicacommunica-tion protocols as well as abstractions of messaging implementations.
Section 3.2 is centered around the internal messaging on the back end. Since the solution must be able to scale well there are questions related to how scalable back-end components may synchronise and communicate with each other.
Section 3.3 is an assessment of problems for handling large scale data processing on the back end both distributed and as a distinct logical unit. It addresses ways to manage and process data as well as making a clear separation between different use cases.
Section 3.4 refers to the problem of ensuring and managing reliable distributed systems. This section is an assessment of how coordination services can be used to handle components of the solution that must be distributed.
3.1
Web Communication
The solution that is addressed in this thesis is defined as a web service since the front end of the system is aimed to be incorporated into a web page or any other container that displays and manages web content. This means that the service needs to use some kind of web communication with the clients. Furthermore, since the communication should go both ways, the way a client and the service communicates is of great interest for this study as the HTTP request/response protocol used for web communication is not bidirectional. Because of this, this section includes an assessment of commonly used ways to create asynchronous and bidirectional communication between a web client and a web server.
Using web communication between the clients and the service also means that the service needs some kind of web server in the front. The service may also be used as a some kind host for web content which also motivates a web server. This section includes a small description
16 Chapter 3. Assessment of Problem Domains
of a number of web servers and a summary, based on the studied web servers, of found properties that is of importance for this kind of solution.
3.1.1
Communication Techniques
This section describes a comprehensive list of available techniques, in form of methods and protocols, used to create asynchronous connections between a web server and a web client. This list is aimed to document how these methods and protocols are used and in what way they differ from each other. An important aspect that is addressed for these techniques is the ability to conform to a standard for web communication. This includes, for example, the way a technique depends on the client’s web browser, if a protocol uses a common port or if and in what way a technique uses the HTTP request/response protocol.
Three terms that relate heavily to this text is Ajax, Comet (Reverse Ajax) and XML-HttpRequest (XHR). Ajax is a general term that refers to a set of client-side techniques used to make it possible for a web client to send data to and retrieve data from a web server without reloading the whole page. Comet or Reverse Ajax is a general term that refers to a set of techniques used to make it possible for a web server to push data to a web client. Comet capabilities is usually achieved by using HTTP streaming or by using Ajax with long polling. XHR is an widely used API that a web client can use inside a web page, using scripting languages, to communicate with a web server using the HTTP request/response protocol without loading a new web page.
Long Polling
Long polling is an HTTP request/response protocol technique used by a client to poll a web server for new content where the server waits with the HTTP response if it does not have any data to send to the client. When the server has new data it uses the connection it is still holding to push the data to the client. After the client receives a response it sends another request to the server. This makes it so that the server always has a connection to the client it can use to send data.
Long polling is widely supported by web browsers since it is a technique that has been around for very long. This also means that there are a lot of solutions and implementations available that uses this technique so it becomes easier to find a solution that fits into a more specific use case. This technique is an improvement of regular polling since it has a high probability to reduce the polling or message exchange frequency and it enables servers to respond immediately when it has new data to send. When many messages are being sent between the server and the client there is no real performance improvement from regular polling[32]. As the server needs to hold a connection open when it does not have anything to send it creates idle processes which take up some resources on the server[41]. Long polling is usually enabled by loading some kind of library onto the client which could add some delay when accessing a page.
HTTP Streaming
HTTP streaming or HTTP server push is a technique for sending data from a web server to a web client by making the web server hold the HTTP connection open after a response has been sent to the client. By holding the connection open the server can push data to the client whenever the server gets new data to send. HTTP streaming can make use of XHR to handle ingoing and outgoing events or use page streaming where the server keeps
3.1. Web Communication 17
the client in the ”loading” state and send script tags which gets executed by the client as soon as they are received.
Since a connection is kept open, there is not as much overhead from sending multiple messages as long polling since there is no need to make an HTTP request/response for every message. This gives greater performance when exchanging a lot of messages between a server and a client compared to long polling[14].
WebSocket
WebSocket consists of a protocol and an API that provides full-duplex communication using TCP over port 80 and is attained with an upgrade handshake from HTTP[59]. This means that web servers can communicate with web browsers asynchronously by both letting the browser send messages to the server and by letting the server send messages to the browser. Since the connection uses the standard HTTP ports for the web there is no need to open up any additional ports to allow WebSocket communication. What is needed is a web server and a web browser that supports WebSocket. A WebSocket connection can send both UTF-8 encoded text and binary data.
Full duplex communication gives increased efficiency over long polling and HTTP stream-ing because of less overhead with the messages exchanged[49]. With the use of TCP and the ability to send message in binary, WebSocket can be used to serve any type of data that a regular program can. WebSocket is supported by a lot of browsers but it is still a fairly new technique and its API has still not yet been standardised so there are still some older and used versions of web browsers that do not fully support WebSocket[5]. It may also be more difficult to find and make use of software architecture solutions such as load balancing since they also need to support WebSocket[26]. WebSocket has its own API which lets developers focus less on the message exchange handling as they have a unified way of communicating and do not need to learn multiple ways, compare different methods, etc.
Bi-directional Streams Over Synchronous HTTP (BOSH)
BOSH is a transport protocol that uses pairs of HTTP requests and responses to emulate bi-directional communication. It uses long polling to ensure that the server always has a connection it can send messages to the client. To let the client communicate with the server it uses an additional HTTP request aside from the long polling request.
BOSH claims to have a very low latency for a protocol that only uses regular HTTP communication[17]. This makes it a very viable alternative for WebSocket as it enables messages to be sent both ways, with methods that may work when WebSocket do not. BOSH uses a JavaScript library to handle messages on the client-side and that library needs to be loaded into the client before BOSH can be used.
Web Real-Time Communication (WebRTC)
WebRTC is an API designed to make voice calling, video chat and file sharing between browsers possible without the need of a plugin. It is being drafted by the World Wide Web Consortium (W3C) and even though it is not complete it is still partially supported by different browsers[57]. The WebRTC project includes components such as network modules and audio and video codecs to make it easier for the developers to create high quality and real time communication functionality inside the browser.
WebRTC uses the Real-time Transport Protocol (RTP) which makes it a better choice for serving content such as audio and video. This also makes the WebRTC into a
Peer-18 Chapter 3. Assessment of Problem Domains
to-peer (P2P) solution for transferring data both between a web browser and a web server but also between two web browsers which means that communication can be set up to work between two clients without the need of going through a server. This is something WebSocket alone cannot do. WebRTC is still being drafted though and is not as supported by web browsers than WebSocket[57]. WebRTC needs a signalling protocol to set up the communication between two clients which is something WebSocket can provide[10]. Server-Sent Events (SSE)
Server-Sent Events is a technology used to provide functionality for sending data from a web server to a web client[56]. It describes how a server can communicate with a client and contains a JavaScript API called EventSource that the client can use to request and handle a connection to a data stream on the server. EventSource is being standardised as part of HTML5 by the W3C.
SSE is unidirectional and does only provide a way for letting a server push messages to a client and has, bacause of this, not as wide of a use case as WebSocket and other bidirectional communication methods. SSE can be emulated by using JavaScript which makes it possible to support older browser that do not support SSE natively[38].
Browser Plugins
There are numerous browser plugins that can be used to enable additional communication techniques between a web browser and a web server. One possible push technique is to make use of a one pixel large Adobe Flash movie to establish an additional connection to the server. This hides the movie from the user while providing the extra communication link from the Flash movie. The extra connection is used as a regular TCP connection without the need of HTTP requests or responses. SilverLight is an application framework from Microsoft that aims to fulfill a similar purpose as Adobe Flash. SilverLight can also be incorporated into a web browser as a plugin and provides different ways for a web server and a browser to communicate. ActiveX is another alternative that gives extra control of the browser. These methods use other or additional types of protocols than the HTTP request/response and needs extra software and explicit support in the browser which makes them a secondary solution for this solution. Using these types of browser controllers have introduced many extra security problems and many of them are widely known for their exploitations by malicious web applications.
3.1.2
Web Servers
This section is an assessment of web servers that can be used as the party that explicitly communicates with the clients of the service. It focuses on asynchronous, event-driven web servers that can scale and perform well when handling many concurrent clients. A very popular HTTP web server that hosts a significant amount of web sites (63.7% of all active websites[25]) over the world is the Apache web server. This server was not included in this assessment since it is not an event-driven web server.
Event-driven web servers sees handling all network connections as a single task which is solved by one (most common) or multiple threads in a loop, iterating over the connections and reading from them and writing to them asynchronously. This is different from the more traditional way of creating a separate thread for each connection. The main drawbacks of using the traditional method is its scalability as creating a thread for each connection creates a lot more memory and thread management overhead.
3.1. Web Communication 19
Nginx
Nginx is an asynchronous event-driven web server that can also function as a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols. It was found to be the second most used web server for all active sites according to Netcraft’s December 2012 Web Server Survey[15] and is used by, for example, Netflix and Wordpress. It is designed to be a very lightweight, scalable and high performance web server and is widely used for load balancing as a front-end server. When using Nginx as web server, 10000 inactive HTTP keep-alive connections can be handled with only 2.5MB of memory[29].
Tornado
Tornado is an open source solution for a scalable, non-blocking web server that also provides an application framework written in Python. It contains a set of modules that includes a WebSocket module and has numerous web application implementations built on top of it such as Socket.IO1 and SockJS2 as well as many help libraries and frameworks[53]. This
makes it a good choice when choosing a web server that can support other, perhaps already used, frameworks and libraries. Including libraries in the Python framework might not be as easy as Tornado needs asynchronous libraries to be able to fully use the non-blocking functionality.
Jetty
Jetty is an open source, HTTP server, HTTP client and Servlet container based on Java and is used in many products such as Apache Maven, Google App Engine, Eclipse and Hadoop. It supports different embedded solutions, frameworks and clusters. It supports WebSocket, SPDY and the Java Servlet API[19]. Jetty features a very extensive and pluggable function-ality which makes it easier to customize and optimize for different use cases.
Mongrel2
Mongrel2 is an open source web server that supports HTTP, Flash, WebSocket and long polling communication[28]. It supports 13 different programming languages and can be built on many different platforms. It is built on ZeroMQ3 for concurrency and is therefore perhaps easier than others to manage when scaling. The ability to implement components or functionality using different languages makes it a very versatile server solution that can handle many types of use cases.
Lighttpd
Lighttpd, pronounced Lighty, is a web server that was designed to be very lightweight and handle 10000 parallel connections on a single web server[24]. It is a very secure and flexible web server that has a very low memory footprint. It is similar to Nginx but is not considered to be as stable and easy to use as Nginx but it does feature a full-blown bug tracking system[63] which is of great use during development.
1www.socket.io
2www.github.com/sockjs 3www.zeromq.org
20 Chapter 3. Assessment of Problem Domains
Node.js
Node.js or Node is a system for deploying fast and scalable network applications written in JavaScript. Node uses event-driven, asynchronous I/O to handle communication and the system can be used to create a web server written in JavaScript. Node has a fairly high amount of modules that can provide additional functionality from the Node core such as implementations of networking techniques, authentication and testing[31].
Event-driven asynchronous I/O generates less overhead than the OS threaded solution and as Node is not threaded it does not have any dead-locking possibilities. This makes Node more viable for the development of scalable web solutions as developers do not have to spend as much time managing a threaded environment[48]. Applications for Node is written in JavaScript which means that a web client and a web server can use the same language. This will make it easier to make web applications as there is no need for interpretation between two languages. It is also much easier to make test cases incorporating both sides. JavaScript is a dynamic language where it is possible to compile code into other languages such as C or Java.
Vert.x
Vert.x is a Java-based application framework which, similar to Node.js, uses event-driven I/O and an asynchronous programming model to provide concurrent applications[55]. Vert.x runs on a Java Virtual Machine (JVM) and supports a fair amount of programming languages such as Ruby, Java, Groovy, JavaScript and Python. One difference from Node.js is that Vert.x can use multiple threads on a single Vert.x instance and does not need to add more instances to fully saturate a multicore server solution. Vert.x does also support different communication patterns natively such as the publish/subscribe pattern.
EventMachine
EventMachine is a software system for the Ruby programming language and provides event-driven I/O. It is designed for concurrent programming where scalability and performance is of concern. EventMachine has been around for many years and is a mature and well-tested system compared to Node.js and Vert.x who have been around for much less time[12]. Ap-plications for EventMachine are written in Ruby which makes it a less language agnostic system compared to, for example, Vert.x and web solutions needs to be written using differ-ent languages on the front end and the back end. Evdiffer-entMachine is also restricted to using EventMachine libraries and cannot ensure that other types of Ruby functionality works.
Twisted
Twisted is a programming framework for event-driven networking written in Python and supports many different protocols such as TCP, UDP, SSL/TLS, IP, HTTP, CMPP, SSH, IRC and FTP[54]. Twisted is, like EventMachine, a more mature solution than Node.js and Vert.x and has a lot of additional functionality. Like the Tornado web server, Twisted cannot directly use any Python library because of the asynchronous nature of Twisted and the synchronous nature of regular Python languages. Twisted might have a steeper learning curve than, for example, Node.js because of its additional functionality it supports and perhaps needs more code and time to set up an application with equivalent capabilities as an application for Node.js.
3.1. Web Communication 21
Summary
When looking at event-driven web servers for asynchronous communication there exists some properties that are used to motivate a specific web server. The programming language that a web server supports is naturally of importance since a certain use case may better fit a specific language based on factors such as the functionality of the use case and the experience of the developer. Supporting multiple languages minimizes the risk of a developer choosing another web server because of the language. Node.js has gained a fair amount of popularity lately because of the use of JavaScript on the server-side, which enables developers to create web sites that use the same, widely known, language on both sides.
Two aspects that are commonly used as low-level performance indicators of a web server are the memory footprint and the message throughput when scaling the number of connected clients. The memory footprint refers to the size of memory a web server needs to handle a certain amount of clients and using an event-driven web server this can be held to a very small amount. Message throughput refers to the number of messages a web server can send and receive for a specific time frame and a specific amount of clients. Both these aspects relate to the overhead of managing multiple connected clients which is one of the main attractions of event-driven servers. They are important as one of the prerequisite of the targeted solution is to be able to scale well and handle many clients concurrently.
The number of supported libraries and modules for a web server are important as it may help abstract parts so that the developer can focus on the core functionality of the solution. Support for general and well-known libraries are also a factor as previous experience can then be used. A prevalent problem with the assessed web servers are the disparity between the asynchronous framework of the web server and the common libraries in the same language which the web server uses. Most of these libraries are not made for asynchronous tasks and may be difficult to assert whether they are reliable for use with the specific web server.
One way the listed web servers can differ is their take on multi-threading focused on scalability and manageability. Node, for example, uses only one thread to handle client con-nections and instead pushes the responsibility to make full use of a processor to the developer where the idea is to use multiple instances of Node. This makes a Node instance much more easier to implement since developers do not have to worry about multi-threading[48] while the scalability of a Node on a single processing unit needs more work outside of the Node instance implementation.
3.1.3
Web Application Libraries
One important aspect when creating a messaging system is the use of abstraction in the networking layer of an implementation. Incorporating a library to use as a higher level interface for communication between a client and a server makes it easier for the developer to focus on the core functionality.
Socket.IO and SockJS are two JavaScript libraries that provide the use of multiple com-munication techniques using fallbacks where a second technique could be used if a first one fails. Socket.IO can be used as a wrapper for WebSocket with fallbacks to other techniques, such as Flash and Ajax long polling. SockJS core does not support Flash but provides different streaming protocols instead. Atmosphere is a Java framework that also contains components for creating asynchronous web application and has support for WebSocket, Server-side Events, long polling, HTTP Streaming and JSONP[2].
One of the benefits of using JavaScript on the server-side is that the same language could be used on both sides. Socket.IO contains two parts: a web-server application that is supposed to run on Node and a client-side API that can be run in a web browser. This
22 Chapter 3. Assessment of Problem Domains
makes it possible to use the same solution on both sides which minimizes the overhead of using two different programming environments. SockJS tries to make its API as similar to the regular WebSocket API as possible so that there is no need to learn and conform to another API.
The JavaScript version of SockJS is made for Node but it currently exists implemen-tations for other languages and server solutions such as Tornado, Netty and Twisted. Socket.IO also has a version for Tornado. Providing an API for multiple server solutions makes it easier for developers to incorporate SockJS functionality using a web server and a language they are familiar with.
The Channel API is an API for Google App Engine (GAE) that enables messaging capabilities between applications written for Google App Engine and web clients[58]. Google App Engine is a cloud computing platform where applications can be deployed and served across multiple servers without the need to manually supervise the infrastructure of the application as components such as web servers are provided and managed by GAE using virtualization. The Channel API is a way to marshal the different communication techniques and instead provide an API that is easier to use when implementing messaging functionality such as server to client publishing.
3.2
Internal Messaging
The web service that is studied in this thesis is supposed to serve more than 100000 concur-rent connections and must keep the client-side latency to a minimum. One important aspect of a web service that should both scale and perform well is the internal synchronisation and communication between back-end components. It is apparent that it is very important to choose a messaging solution that properly fits the targeted solution. This section describes some different messaging paradigms to get a better understanding how these work and when they should be applied.
3.2.1
Messaging Paradigms
The messaging paradigms that are described are remote procedure call (RPC), distributed shared memory (DSM), message queueing and publish-subscribe. These paradigms differ in many ways and have different uses and this study tries to assess when and how these are applied. The entities in a messaging system are usually called the message sender, the message recipient (receiver) and the messenger (broker). A messenger or broker is an intermediate component that makes sure that a message is routed from a sender to a recipient.
Remote Procedure Call (RPC)
Remote procedure calls or RPC refers to when a software invokes a procedure in another program environment, commonly over a network[3]. This is usually implemented in a way so that there is no large difference between coding local functionality and issuing remote invo-cations. RPC tries to make remote processing transparent by treating a distributed system as a single process. RPC is used to minimize the impedance for accessing local and remote functionality and to reduce the complexity of using a full-fledged messaging system and in-terface. Using RPC is problematic when addressing the difficulties of distributed messaging as remote procedure calls faces different problems than local calls such as partial failures, and different memory access models in communicating parties. One of the disadvantages
3.2. Internal Messaging 23
of RPC is that there currently does not exists a general solution which has proven RPC a fitting solution for large scale communication over a wide area.
Distributed Shared Memory (DSM)
Distributed Shared Memory or DSM refers to the solution where a virtual address space is shared among loosely coupled processes, often over a network[30]. A process views the shared memory as regular local memory and moves the issue of scaling a distributed solution to mapping a memory access request to a space in shared memory. This method gives a very simple abstraction as implementations using a distributed shared memory do not have to worry about moving data. This also simplifies process migration as different machines all still use the same memory. The DSM provides time and space decoupling as processes using a DSM does not know about each other but does not provide synchronisation decoupling. A DSM is generally not as used for typical client-server models where resources need to be viewed and accessed in an abstract manner because of security and modularity issues[9]. Message Queueing
Message queueing enables buffering of a sent message for later retrieval and decouples the sender and the receiver in respect to space and time[13]. This is commonly done by intro-ducing a messenger between a sender and a receiver that first enqueues sent messages and then sends them to the receiver when the receiver expresses its interest in it. This paradigm often provides transactional, timing and ordering guarantees for the messages that are sent. It is used to provide greater resilience between communicating processes and is an appro-priate paradigm where there exists a discrepancy between the number of messages being sent to a process and the rate of which the process can receive and process messages. Mes-sage queues do not provide synchronization decoupling with well-supported scalability as consumers synchronously pull messages from the queue.
Publish-subscribe
The publish-subscribe pattern decouples the communicating processes with regards to space, time and synchronisation[13]. Receiving processes address their interest in some kind of content and sending processes address their interest in sending some kind of content. The implemented publish-subscribe system is then responsible to route sent messages to processes that have expressed interest in the content that the message consists of. This is a very similar paradigm to message queueing as they decouple the processing in the same manner. The publish-subscribe pattern is more appropriate when there exists a need of more intricate routing messages than a one way communication path. This paradigm also enables a higher scalability as it can make better use of the underlying network architecture and process topology.
3.2.2
Qualities of Service (QoS)
This section addresses three qualities of service that may be of importance when implement-ing a messagimplement-ing system: persistence, reliability and availability.
Persistence
Persistence is a quality of service that is desirable in a messaging system when the sender and receiver are decoupled in respect to time and there exists a situation where the sender cannot
24 Chapter 3. Assessment of Problem Domains
directly be guaranteed that the message has been delivered to the receiver[13]. Persistence often means storing messages on a hard drive or flash memory so that messages are not lost between the sender and the receiver if, for example, a messenger would crash. This quality of service may degrade the performance of a system since messages cannot only be stored, for example, in RAM but must also be put on some persistent, often much slower, storage. This is of concern when facing a system that should be able to perform well while scaling as distributing more messengers may make the system more complex when trying to guarantee the delivery of a message in a path of a larger messenger architecture.
Reliability
Reliability is another quality of service related to aspects of guaranteeing the delivery of a message[13]. Reliability addresses end-to-end delivery guarantees with different properties such as using reliable protocols, persistent messages and message ordering. This is an important quality of a messaging service when using more loosely coupled messaging system and the end-to-end participants are not using point-to-point communication. One problem with distributed systems is the fault tolerance of the distributed architecture and ensuring messages are delivered in case of a messenger crash.
Availability
Availability refers to the portion of a time interval a service can ensure it operability. Differ-ent systems may have differDiffer-ent definitions of operability where, for example in a multi-server architecture, the system may still be considered operational as long as one server is still func-tioning. This is not as much of a messaging aspect as it is a more general quality of service for distributed systems[13]. When implementing a solution that gives the internal messaging a central role it is important that the messaging system is robust enough to guarantee a functional state for as long as much. An example of increasing a messaging system’s ro-bustness and availability is by using replication of messengers so that one messenger could pick up a delivery should another crash or somehow fail. This is on the expense of increased resource management and cost as well possible degradation of performance.
3.2.3
Messaging Systems
The following parts of this sections describe a couple of messaging systems that, as of today, have some fairly large support and use. This section tries to highlight the main use cases of the messaging systems as well as describe in what way they are used and differ from each other.
Kafka
Kafka is a distributed publish-subscribe messaging system that is written in Scala and was made for managing activity stream processing on a website[22]. It is an explicitly distributed broker system that was developed for LinkedIn and is currently an open-sourced project by Apache. Kafka is designed for persistent messages and focuses on throughput instead of features. It offers a JVM-based client and a primitive Python client but supports any language that makes use of standard socket I/O. Kafka can use different data structures where some support transactional semantics - this can reduce performance though because of the disk operations Kafka uses for persistency.
3.2. Internal Messaging 25
Kafka is a fairly recent messaging system which do not use AMQP4 and does not offer as much functionality or complex routing because of its focus on throughput instead of features. Kafka has a low overhead in network transferring and on-disk storage which gives it good performance [36]. It uses a very lightweight API that is simple to use and together with its distributed architecture it performs very well when scaling out. As Kafka uses its own non-AMQP protocol it is more difficult to replace.
RabbitMQ
RabbitMQ is a widely used open-sourced messaging system that implements a broker ar-chitecture and uses AMQP[35]. It supports HTTP, STOMP and MQTT and provides client libraries in many different languages such as Java, Python and Erlang. The rabbitMQ brokers use Erlang to pass messages around. A RabbitMQ messaging solution can be dis-tributed in three different ways, clustering, federation and shovel. Clustering refers to when multiple brokers are grouped and act as a single logical broker. Federation is when brokers connect to each other, typically over the Internet, to exchange and share messages so that data is mirrored between them. Shovel refers to when brokers are connected and instead of mirroring their data they forward it to others. RabbitMQ contains solutions for high availability and fail-overs using many different techniques such as clustering and mirroring data.
RabbitMQ is one of the most used solutions for AMQP messaging systems and has been around for some time. Because of this, RabbitMQ has a lot of libraries, tools and documentation which makes it a good choice when looking for a reliable messaging system. RabbitMQ is not as focused on performance as Kafka and instead provides additional func-tionality that Kafka does not. It features more different routing capabilities and is highly configurable.
Qpid
Qpid is an open-sourced messaging system from Apache that, similar to RabbitMQ, im-plements AMQP[34]. It provides brokers in Java and C++ and many client libraries such as Java, C++, Python and Ruby. Qpid contains many features for managing transactions, queueing messages, and distributing brokers. It is very configurable and uses a pluggable layer to provide additional functionality such as persistency and automatic client fail-over. Qpid does not feature as much functionality for a distributed solution as Kafka and Rab-bitMQ and clustering is only used to mirror all messages between brokers.
Qpid is, similar to RabbitMQ, highly configurable and uses AMQP which makes it a good choice when replacing some other messaging system. Qpid is also very similar to JMS and provides a JMS API as well which may make it easier to implement when already using JMS.
ActiveMQ
ActiveMQ[43] is an open-sourced message broker from Apache that implements JMS. Ac-tiveMQ supports many languages such as Java, C/C++, Perl, Python and Ruby and a wide variety of protocols including AMQP. ActiveMQ supports a lot of clustering options such as a shared database, a shared file system or a master-slave set up. It also supports
4Advanced Message Queuing Protocol, an open standard application layer protocol for message-oriented