Resource Provisioning - Flower Architecture

6.2 Flower Architecture

6.2.2 Resource Provisioning

To enable accurate yet timely resource provisioning, Flower uses advanced control theory to automatically reason about resource resizing actions. Flower’s adaptive resource control engine is able to continuously detect and self-adapt to workload changes for meeting users’ service level objectives.

The sensor and resourceactuator are the key components of the controllers. The sensor module is responsible for providing resource usage stats as per the specified monitoring window. The actuator is capable of executing the controllers’ commands such as adding or removing VMs and increasing or decreasing number of Shards. Flower’s sensor module periodically collects live data from multiple sources such as CloudWatch [2] and inserts into the actuator module. The controller regulates the actuator value from the previous time step proportionally to the deviation between the current and desired values of the sensor variable in the current time step.

6.2.3 Cross-Platform Monitoring

Many cluster monitoring tools such as Ganglia [16] are available to assist adminis- trators. However, they fail to provide a holistic view of performance measure across the data analytics flow. Therefore, one requires to check out different systems and user interfaces in order to track any possible performance failures or slowdowns. For example, monitoring an analytics flow application built upon Kafka [10], Storm, and DynamoDB systems requires to track performance statistics in three separate user interfaces. More importantly, these platforms do not necessarily have the consis- tent definitions for the same performance measures, which makes evaluating metrics

§6.3 Flower Workflow 87

across the data analytics flow challenging.

To tackle the issues above, Flower introduces a module called all-in-one-place vi- sualizer, which allows users to visually define a monitoring layer on top of multiple systems. The module calls the APIs of the systems such as CloudWatch and Storm, and consolidates the following performance measures in an integrated user interface as shown in Fig. 6.3:

• System-level measures such as CPU, Memory or Network Utilization provide a general view of the system performance.

• Application-level measures refer to specific metrics of the individual system job or application such as Incoming Records in Kinesis or Acked/ Failed Tuples in Storm.

• Flow-level measures are the combined measures such as Latency from different platforms to provide an end to end performance value.

6.3 Flower Workflow

In a nutshell, the workflow of the system is as follows. First, dependencies between workloads’ resource usage measures are analyzed. We apply linear regression techniques to estimate the relationships among variables. The dependency information along with the cloud services costs and the user’s SLO constitute the required inputs for the generation and then search of provisioning plan space.

The resource share analyzer module is then invoked to determine the maximum resource shares of each layer given the user’s SLO. Once the upper bound resource shares for each layer are identified, the adaptive controller tailored to each of the three layers automatically adjusts resource allocations of that layer. Note that the resource shares can be determined with respect to arbitrary time windows.

The controllers are regulated based on a number of parameters, including moni- tored resource utilization value, desired resource utilization value, history of the controller’s decisions. In other words, the controllers continuously provision resources to adequately serve the incoming records or input data in order to keep resource utilization of each layer within the specified desired value.

6.4 Flower is Action

In this section, we give a walk-through over the key features ofFlowerfor managing the data analytics flow. Fig. 6.4 shows high-level squence diagram of how to run a flow elasticity manager. As you can see, three main objects including Flow Builder, Flow Configuration Wizard, and Controller Performance Monitor are involved in running an elasticity manager in a number of steps:

Figure 6.4: The high-level sequence diagram of how to run an elasticity controller in Flower

1. Flow Builder: Flower’s Flow Builder is used to drag and drop multiple platforms and create a data analytics flow via its graphical user interface as shown in Fig. 6.5.

2. Flow Configuration Wizard: In this step, we need to complete a wizard to configure the controllers with information such as resource name (e.g. table name in DynamoDB), desired reference value and monitoring period as shown in Fig. 6.6 for the selected systems in previous step.

3. Controller Performance Monitor: Once configuration is completed, we will then be able to run the service. After starting the service,Flowerlaunches visu- alizations, showing various features like current and future resource allocation, deviation from desired utilization, performance measures as shown in Fig. 6.7. We can then observe how different controllers change the cloud services capac- ities dynamically and the resulting performance.

Flower also provides an interface to adjust tunable parameters of the controllers such as elasticity speed, monitoring period, or even their internal settings as shown in Fig. 6.8.

In addition to the demonstration above, one can create a cross-platform monitoring dashboard using similar steps, experiencing live monitoring of multiple systems all in one go.

§6.4 Flower is Action 89

Figure 6.5: Flower’s flow builder interface

§6.4 Flower is Action 91

6.5 Summary

In this chapter, we presentedFlower, a system for data analytics flow elasticity management. Flowerwith a set of rich functionalities aims at assisting admins or DevOps engineers in workload management of the data analytics flows. Currently, such a task is performed naively using simple auto-scaling systems provided by the cloud providers where the users need to define some static rules beforehand and tweak them frequently in order to find efficient allocation plans. In this regard, we high- lighted how Flower helps them maintain their application flows performance with much less effort.

Chapter7

Conclusion

Big Data has been grown rapidly in various complex disciplines such as science, so- cial network, engineering, and commerce and nowadays increasing cloud-hosted big data analytics flow applications leads to increased complexity regarding the architect, run-time performance and workload management.

As we have described in Chapter 1, the data analytics flow applications combine multiple programming models for performing specialized and pre-defined set of activities, such as ingestion, analytics, and storage of data. To support users across such heterogeneous workloads, this thesis successfully proposed a set of intelligent performance and workload management techniques and tools.

In conclusion, we briefly highlight the major contributions and future research directions that can be built upon the outcomes of this research under three main cat- egories including resource performance prediction, dynamic resource management, and tool support.

7.1 Resource Performance Prediction for Data-intensive Sys-

tems

We have described our first research question as: How can we predict the resource and performance distribution of data-intensive workloads?

In response to this question, we proposed a new distribution-based performance modelling technique for batch and stream processing systems. The proposed approach is based on the statistical machine learning techniques and is easy to adapt to a wide variety of systems modelling problems. To demonstrate the usefulness of the distribution-based workload modelling, we designed and implemented two workload management mechanisms including i) predictable auto-scaling policy set- ting; and ii) predictive admission controller. We thoroughly discussed that a MDN- based prediction approach provides a complete description of the statistical proper- ties while retaining the strength of the existing single-point prediction models.

In this study, we focused on offline learning, but the real-time nature of some workloads necessitates revisiting this work to investigate new learning models and update methods that are both fast and able to capture and adapt to new data points with time. To enable online predictive modelling, a natural future direction will be

enhancing the MDN to be able to refine kernel functions at runtime. To be more specific, one can plan to enhance the MDN to be able to build prediction models at runtime. For this purpose, with the aid of online learning notions the MDN will be revisited to be able to take an initial guess model and then picks up one- one observation from the training set and recalibrates the weights on each input parameter.

Another avenue for future work is to hook the proposed model into Apache Yarn [13] or Apache Mesos [11] to build more intelligent resource or query scheduler.

In document Workload Modelling and Elasticity Management of Data-Intensive Systems (Page 106-114)