6.2 Configuring The Cloud
6.2.2 Configuration Management Tools
Configuration Management (CM) is used in many disciplines from civil engineering to military applications such as weapons system development and has many definitions. In this thesis within the context of distributed systems, the following definition is used:
Definition Con f iguration Mangement A management process that focuses on establish- ing and maintaining consistent system performance and functional attributes using re- quirements, design, and operational information throughout a system’s life-cycle. [3]
Configuration Management is a large area of work. For the research presented in this Chapter, particular attention is placed on a set of tools that provide similar automation functionality to that presented here in on the topic of Contextualization but that has not been designed in the context of Cloud Computing and thus tailored for use in the dynamic environments that Cloud applications operate within. Additionally, these tools have issues of convergence where the continuous reconfiguration of software does not lead to any
stable or desired state. These same issues are applicable to the research discussed later in this Chapter on the topic of Recontextualization and has an impact on the decision making processes within a Cloud environment. For example, if a decisions is made to migrate a Cloud resource and reconfigure it for a new environment, only for the problem for which it was migrated not to be resolved, a perpetual state of Recontextualization could occur.
There are three generalised approaches to configuring the Cloud service software stack:
• Manually: Configuring each software component by hand. Not feasible in large scale Cloud deployments, is time consuming and error-prone.
• Pre-Configured Static Images: A common approach where a set of images are set-up once and reused. A brittle approach where changes to the configuration of the images have to be made to all instantiated VMs with no way of propagating the changes from previous image versions currently in use.
• Configuration Management Tools: Tools or software approaches that apply CM to automate the configuring of an image or instantiated VM. This provides enhanced control and flexibility.
Configuration Management Tools provide a number of benefits including i) the repro- ducibility and industrialization of software configuration, ii) the continuous vigilance over running systems with automated repairs and alert mechanisms, iii) enhanced control over and rationalisation of large scale deployments and iv) the ability to build up a knowledge base to document and trace the history of a system as it evolves.
This subsection of the thesis discusses the concept of Configuration Management for the automated deployment of Cloud applications software dependencies at the PaaS level. The feature sets and comparison of three configuration management solutions: CFEngine 3 [40], Puppet [199] and Chef [41] are presented. Both systems can be used to automate infrastructure deployment covering thousands of machines.
6.2.2.1 CFEngine
The CFEngine [40] project provides automated configuration management of large net- worked systems. CFEngine can be deployed to manage many different types of computer system such as servers, desktops and mobile/embedded devices. The project was started in 1993 by a post-doc, Mark Burgess at Oslo University, as a way to automate the manage- ment of dissimilar Unix workstations2 via the abstraction of platform differences using 2Details of the initial version of CFEngine can be found in an internal report at: http://www.iu.hio. no/˜mark/papers/cfengine_history.pdf
a domain specific language. In [34] the foundations of self-healing systems were devel- oped by Mark Burgess and as a precursor, heavily influenced the ideas of Autonomic Computing developed later by IBM.
CFEngine uses decentralised, autonomous software agents to monitor, repair and up- date individual machines. It is comprised of a number of components with varying re- sponsibilities:
• cf-promises: Used to pre-check a set of configuration promises before attempt- ing to execute them.
• cf-agent: An agent that manipulates system resources to enact change.
• cf-serverd: A server able to share files and receive requests to execute existing policy on an individual machine.
• cf-execd: A scheduling daemon, executing and collecting the output data from cf-agents.
• cf-runagent: A helper program that communicates with cf-serverd for the purpose of requesting the update of cf-agent(s) existing policy and is used to push out changes to CFEngine hosts.
• cf-report: Generates summary and other reports in a variety of formats. • cf-know: An agent used for rendering documentation as a ‘semantic web’ from
system knowledge.
The components operate over four phases of system management, which are based on transactional changes:
• Build: A template of proposed promises is built for the machines in an organization. If the machines keep these promises, the system function as anticipated.
• Deploy: Implement previously decided policy via pushing out changes to agents. • Manage: Autonomous agents manage unplanned system events. Rare events that
cannot be dealt with automatically set off alarms for human intervention.
• Audit: System report generation for determining what changes were made by agents and if they were policy compliant.
At the core of CFEngine lies the idea of convergence [35], where the final desired state of the system is described instead of the steps needed to get there. This enables CFEgine to run whatever the initial state of the system is with predictable end results. The downside to this approach is that only statistical compliance or best effort can be achieved with a given configuration policy, where by a system can not be guaranteed to end up at a desired state but slowly converges at a rate defined by the ratio of environmental change to the rate at which CFEngine executes.
Puppet [199] and Chef [41] are two other configuration management systems widely used as replacements for CFEngine and have tried to combat particular short comings in CFEngine’s design and ideology when used in some specific use-cases.
6.2.2.2 Puppet
Puppet was forked from CFEngine and provides graph-based and model driven approaches to configuration management, through a simplified declarative domain specific language that was designed to be human readable. The model driven solution enables the configura- tion of software components as a class, a collection of related resources where a resource is a unit of configuration. Resources can be compiled into a catalogue that defines re- source dependencies using a directed acyclic graph. A catalogue can then be applied to a given system to configure it.
6.2.2.3 Chef
Chef, a fork of Puppet, rose out of the Ruby-on-Rails community out of dissatisfaction with Puppet’s non-deterministic graph-based ordering of resources. This is useful for bringing a system from its current state into compliance with a specified state and where ongoing configuration changes are more important than the initial provisioning. In con- trast, Chef places emphasis on starting up services from newly provisioned cleans sys- tems, where the sequence and execution of configuration tasks is fixed and known by the user. This makes Chef particularly well suited to the paradigm of Cloud Computing where VM instances are short lived and new instances are spawned from a newly provisioned base image. Chef uses the analogy of cooking and creates “recipes” that are bundles of installation steps or scripts to be executed.
6.2.2.4 A Comparison
All the tools share a common ancestry with CFEngine 3 and have been designed specif- ically with configuration management in mind. The tools are open source, provide text-
based interfaces and are based around the client-server model. The tools differ in a number of ways: support for Windows in full is only provided in CFEngine 3 which is also written in C and follows the GPL license. Puppet and Chef have limited partial support for Win- dows with regards to what aspects of the Operating System can be configured and both are written in Ruby. Finally, Chef does not provide a declarative domain specific language and instead uses extensions of the Ruby programming language to describe configuration. This can be of benefit but there is a trade off between clarity in defining the configuration and power to configure as needed.
Table 6.1. Comparison of Configuration Management Tools
Tool Language Platform Support Domain Specific Language
CFEngine ANSI C Windows, Mac OS X,
BSD, Linux Yes (Declarative)
Puppet Ruby Windows (partial), Mac
OS X, BSD, Linux Yes (Declarative and Imperative)
Chef Ruby Windows (partial), Mac
OS X, BSD, Linux No (Imperative)
The performance and scalability of Puppet and Chef are limited due to the language they are written in and the maturity of the solutions. There are still several issues with the scalability of Puppet out of the box3. Chef scalability still remains to be fully explored4 and several issue are not solvable without a familiarity with generalised scaling tech- niques5. On the other hand, CFEngine is written in C and its decentralised architecture has recorded deployments by enterprise users of over 10,000 host nodes.
Although Chef is well suited to Cloud Computing with its emphasis on configuring systems from scratch, it does not resolve all the issues surrounding configuring an appli- cation in a dynamic environments and has been particularly tailored to the deployment of Ruby on Rails applications in Clouds. The next subsection provides further details on these issues in the context of Cloud Computing and introduces the idea of contextu- alization as a mechanism to provide VMs with identities for the purpose of configuring clusters or resources for general purpose use.
3Puppet scalability:http://projects.puppetlabs.com/projects/1/wiki/Puppet_Scalability 4Chef scalability:http://lists.opscode.com/sympa/arc/chef/2011-08/msg00000.html 5Concurrency in Chefhttp://lists.opscode.com/sympa/arc/chef/2012-01/msg00422.html