The types are introduced to the cloudlet via the application. Developer outfit their applications with the appropriate types by compiling the schema into the application. The users are prompted by the application for their cloudlets addresses. Once they enter the address, the application can connect to the cloudlet server. The application registers the schemata it requires with the cloudlet framework.
The framework will store the schemata. The application will afterwards request access to the schemata and the cloudlet server will ask the user if they agree to these access rights. If the user agrees, the cloudlet will provide an access token to the application, which enables it to access the corresponding data.
In order to allow developers discover schemata, the cloudlet frameworks can pool their knowledge and statistics to a common component, the registry. While multiple registries are possible, it serves as a central point to the connected cloudlet frameworks. Developers are able to gain a wider view of existing schemata and their usage through a searchable index. While cloudlet schemata lack a defined name, their meta-data enables the registry to display them in a humanly readable form. A schema implicitly represents a type and therefore an abstract entity, which can be linked via a resource description language.
Figure 11 Cloudlet and registry interaction
Figure 12 Cloudlet and registry interaction.
8 Architecture
As outlined in section 2 we investigated and broadly specified four potential frameworks but opted to develop a multi-tenancy framework with a single large datastore. The Cloudlet Framework itself will be implemented in the form of a distributed application.
Distributed applications are composed of a number of software components called workers distributed across a number of hardware nodes; workers communicate with each other by passing messages. In OPENi we will create workers for each component as outlined in section 5. Each worker will have its own internal REST API and will communicate with other workers via a messaging framework.
The benefits of a distributed application are:
Functional separation – The overall functionality of a distributed application is partitioned into small workers covering a single functional area. As the OPENi project has a large development team distributed across a number of countries partitioning the application into many components has its benefits. Smaller groups from one or two countries can autonomously develop individual components focusing on their functionality; due to the design-by-contract nature of the REST APIS integration difficulties are minimised. For Example, OPENi’s service enablers can be deployed directly on the platform and request all data from the internal components via their APIs.
Heterogeneity – Components can be implemented with different programming languages and on different operating systems.
Resilience – In a distributed application workers can be replicated on multiple hardware nodes. If a node was to die the application can recover by re-routing messages to anther node with similar workers.
Scalability – Scaling a distributed application simply requires adding more hardware resources. Furthermore, as the application is partitioned into many functional areas, each functional area can be scaled independent of others. A slow functional worker can have extra resources thrown at it so that it doesn’t slow the whole system.
Load balancing – Distributed applications allow for worker migration within a cluster to achieve load balancing and improve performance.
Economy – Sharing hardware reduces the cost of maintaining a platform.
8.1 Software Stack
8.1.1 Languages
Most components will be written in node.js, the cloudlet framworks preferred programing language.
The benefits of node.js are: 1) it is a dynamic language which allows for rapid prototyping, 2) it can handle thousands of concurrent connections with low overhead, 3) its strong support with JSON (our preferred API and messaging format), 4) its event loops, and 5) a number of application frameworks have been built for the language e.g. express and Geddy.
However, a recurring criticism of node.js is that it’s codebases are difficult to maintain as it grows;
splitting the Cloudlet Framework into many smaller components will help address this inherent drawback with the language.
8.1.2 Messaging Framework
ZeroMQ [9] is an open source socket library that acts as a concurrency framework for scalable, distributed and concurrent application. It includes features that allow messages to be sent via in-process, inter-in-process, TCP and multicast as well as the ability to support many-to-many connections [10]. ZeroMQ libraries are available in a wide range of languages including C, C++, Java, .NET, node.js and Python. Software patterns utilise asynchronous I/O in order to build scalable multicore message passing applications. Its suitability to our platform are categorised as follows:
Patterns – ZeroMQ supports a number of communication patterns including: pipeline, fan-out, fan-in, publish-subscribe, task distribution and request-reply and exclusive pair. These patterns allow us to create multiple topologies within the one distributed application. I.e.
workers/components can communicate with each other as they wish.
Asynchronous I/O – it handles asynchronous I/O in background threads so there are no locks or wait states in concurrent ZeroMQ applications.
Multiple transport methods - Different ways of passing messages including: TCP, multicast, in-process, and inter-process which means that the same application code can be deployed on a large cluster of computers or alternatively a single personal plug server.
Languages Supported – libraries are available in a wide range of languages including C, C++, Java, .NET, node.js, and Python. While node.js is the preferred language the development of some components may be accelerated by utilising code libraries and applications written in another language like Java.
Libraries - there are many libraries that build upon the native APIs.
OSs – supported on the main operating systems including Windows, Mac and Linux.
8.1.2.1 Sample ZeroMQ Topology
The diagram above outlines the basic brokerless pub-sub ZeroMQ topology employed on the cloudlet framework. The messages are passed through the distributed application in a waterfall methodology;
they follow a one way route from the Mongrel2 web server through one of the API workers, application logic then decides the next path for the message. If there is a problem with the original
8.1.3 DataStore
Primarily due their scalability, NoSQL databases appear to the most appropriate solution for the OPENi datastore. NoSQL databases are designed for high performance and throughput when handling vast amounts of data from large numbers of concurrent users. Of the many NoSQL datastores we compared (See Appendix II for the full analysis) we opted for CouchBase.
CouchBase’s support for the map-reduce paradigm is essential for the data aggregation component.
Couchbase also sits well with ElastiSearch which allows us to easily provide internal data filtering and external search funtioanlity. Additionally, CouchBase has native JSON and JavaScript support, and can support secondary indexes.
8.1.3.1 CouchBase
CouchBase is an open source NoSQL databases system, managed by the Apache Software Foundation. CouchBase is a document based storage system that persist data in a JSON format.
According to the CAP theorem partition tolerant distributed databases have to make a design choice between availability and consistency, CouchBase provides consistency and partition tolerance.
Couchbase is designed to run on commodity hardware and can be run on both physical machines and virtual machines. These machines, as nodes, can be combined to form a Couchbase cluster.
Couchbase clusters support data partitioning to allow for fault tolerant systems. Data replication between nodes is another fault tolerate mechanism used by couchbase.
Each Couchbase node contains a cluster manager and a data manager that control the core functions of couchbase. The couchbase cluster manager controls the configuration and communications of all servers within the Couchbase cluster. It also handles the collection of statistical and performance data within the nodes to be displayed on the web control panel. The data manager is responsible for the reading and writing of documents to the data store. Once data is written to memory the client that requested the write is notified and the data manager then adds the data to the queue to be written to disk.
Couchbase has a management and monitoring web front end that allows for both the administration of the cluster and the viewing of performance statistics. Couchbase supports plug-ins to extend its functionality with the utilization of 3rd party applications. The main plug-in of interest is Elasticsearch which allows multiple new features. Elasticsearch allows for real-time analytics, full text search, geolocation query and more. This will allow for greater cohesion between the datastore and services that need analytics or aggregated values.
Indexing of data and map reduce tasks in couchbase are done via “views” which are then distributed to all nodes in a cluster to apply the index to data they hold. Querying data in couchbase supports multiple methods of querying. Standard queries via specific ids as well as range queries and aggregate lookups are all supported by couchbase. Couchbase views are updated incrementally as the data changes which remove the concern in relation to index calculating batch jobs.
Couchbase provides a REST API for data access and it can also be accessed via the management and monitoring web front end. Couchbase has libraries for most main programming languages which can be easily incorporated into applications to allow for quick access to couchbase.
8.1.4 Web Server
Mongrel2 [11] is a protocol and language agnostic web server. It has supports HTTP, Flash sockets, WebSockets, Long Polling on the user facing side and exposes ZeroMQ handlers on the backend. The Cloudlet frameworks workers will integrate with Mongrel2’s handlers. Conveniently the node open-source community created a node.js handler for Mongrel2 called m2node. Mongrel2 also supports the serving of Static content and http proxying to application servers running on other locations or ports.
It is key to the integration of the overall OPENi Platform.
8.1.5 Load Balancer
nginx [12] is a compact software package containing a HTTP proxy, load balancer, edge cache and origin server. nginx has many features which cover protocols and performance, request routing, security, edge cache and origin server, and configuration and management. Some of the key features are IPv6 support, support for up to a million concurrent connections, multi-tenancy, session persistence, connection and request policing, header manipulation, load balancing and many more.
nginx provides reverse proxy and load balancing features that allows the act of load balancing to be done in software rather than hardware. The configuration of the load balancers is done in a configuration files which allow for the fine tuning of the logic behind the load balancing. Weights can be assigned to each interface and this will dictate the amount of traffic routed to each as well as IP hashing to ensure that returning IP addresses are returned to the same interface [13].
8.1.6 Cloud Platform
When deciding on which cloud technology we should employ on the OPENi project we analysed a number of Infrastructure-as-a-Service (Iaas) and (Paas) offerings (see Appendix III for the full analysis). Of the PaaS offerings CloudFoundry was the closest to meeting our stack requirements however it doesn’t provide built in support for ZeroMQ. Therefore CloudFoundry and the other PaaS offerings were excluded.
its place among the top cloud technologies and provides a sense of reassurance about the future of OpenStack.
8.1.6.1 OpenStack
OpenStack [15], as described in deliverable 2.2, is a collection of open source components used to deliver public and private clouds. OpenStack’s interoperability with multiple hypervisors, such as Xen, KVM and VMWare ESXi, and other cloud providers, like Amazon EC2 and S3, have made it a popular solution for cloud platforms.
OpenStack provides access to the underlying hardware with the use of the OpenStack APIs. This coupled with the ability to use a command line interface for the management of the physical hosts allows for finer control of the system as a whole.
Managing the virtual machines in the system can be done several ways. OpenStack has a web
“Dashboard” GUI that can be used to manage all the main aspects of the system including but not limited to VM instance and image management, VM provisioning, user management and other security features. The OpenStack API can also be used to manage these aspects of the system as well using the Amazon EC2 API as OpenStack has implemented the system so that it can support the EC2 API.
The key security features are the ability to create security groups, user groups and the user management system. The security groups allow for the finer control over the flow of information in the system and can be used to isolate groups of machines from one another and mitigate any possible security risks that would be present in a system where all machines had access to one another.
8.1.7 Orchestration
We will use Chef to orchestrate the OPENi platform’s hardware nodes, virtual machines, and ZeroMQ workers. Ganglia and Nagios are system monitoring tools. We will also investigate Docker as a mechanism for simplifying platform resource deployment.
8.1.7.1 Chef
Chef is an IT infrastructure automation and configuration management tool used by system administrators and DevOps personnel [16]. Chef uses ruby as its main language and its configuration groups and files called “cookbooks” and “receipts”. The CLI tool known as “knife” provides an interface between the local Chef repo and the Chef server; it is also the main means to manage cookbooks and receipts.
A Chef cookbook is a collection of receipts that can be used to configure a system in a certain manner. It could have a “database” cookbook which could install all the services required on a machine to be a database. Normally separate cookbooks would be created for each application, i.e.
the user could create an Apache cookbook and then download all the receipts for apache into that
cookbook. The apache cookbook could then be included in different configurations i.e. a database configuration, which would then be pushed to all machines using that configuration.
The “run list” knife command is used to push or configure what cookbooks need to be running on a certain machine. Using the knife command the Chef server can connect to its hosts and push the configuration cookbooks to the host.
8.1.7.2 Ganglia
Ganglia [17] is an open source distributed monitoring system for systems such as clusters, grids and clouds. It is optimized in such a way that it has very low per node overhead while still providing accurate statistics about each node using a combination of XML, XDR and RRDTool.
Ganglia has a hierarchical structure and runs a script on each node in the system. The information is gathered and then passed up the tree; this collected information is then displayed upon a web interface that shows graphs of many of the details of the system including CPU usage, network usage, memory usage and many more. Ganglia can be extended to application level monitoring. E.g.
The data authorisation component could notify Ganglia each time an app is refused access to a cloudlet. The resulting Ganglia graphs give the platform manager insights into usage patterns over time.
8.1.7.3 Nagios
Nagios [18] is an IT infrastructure monitoring system that allows organisations to identify and resolve issues when they first appear and before they progress further. It can be set up to monitor different aspects of the IT infrastructure such as system metrics, network infrastructure, applications, services and servers.
Nagios includes many useful features including the ability to set up alerts for critical failures of the system that will be sent via email, SMS or a custom script. These alerts can also be grouped in different levels of alert so that if an alert is not handled promptly it can be escalated to a higher level of priority. It can generate detailed historical reports for review that include outages, events, notifications and alerts.
8.1.8 Deployment 8.1.8.1 Docker
Docker is an open-source project that allows developers to build and run cloud based applications.