3.5 The Potential of Cloud Computing
3.5.3 Cloud-First Frameworks
This thesis introduces the notion cloud-first frameworks to denote frameworks that are entirely cloud based and utilise cloud services such as VMs, cloud storage and Big Data services. Such frameworks address particular needs, for example Big Data processing, and provide a Platform as a Service (PaaS) for users to deploy their own applications on the cloud. On the commercial front, Google offers Cloud Dataflow [112], a framework for developing data pipes to process, transform and aggregate large datasets using other GCP services. Amazon on the other hand offers AWS Data Pipelines [113] to process and move data between the different AWS services. Both offerings are transparent to the user, with the cloud provider managing and scaling the underlying infrastructure and users simply paying for using the service.
3.6
Summary
This chapter has provided a review of the cloud computing paradigm including a high level overview of public cloud deployments. The key cloud services utilised in this research were reviewed and an introduction to the Google Cloud Platform was provided. Moreover, this chapter provided a background on some of the related research and commercial work utilising cloud elasticity and utility billing features. Finally, this chapter presented a definition of cloud-first frameworks and explained that they fully utilise cloud elasticity and utility billing. These frameworks utilise not just VMs, but also other cloud services such as storage and Big Data services. Cloud-first frameworks such as Google Cloud DataFlow and AWS Data Pipelines have two disadvantages 1. they are based on a particular cloud platform and users can not freely move their applications to another cloud provider without considerable changes, 2. the underlying cloud resources being used is not exposed to the user, hence users are not able to specify the number or the specifications of resources to be used in order to optimise their computational usage or reduce cost. The concepts reviewed in this chapter are used in Chapter 5 to address these two disadvantages and to present the architecture of a novel and generic cloud-first framework (CloudEx) that utilises cloud VMs, cloud storage and Big Data services. The approach followed avoids the so-called “lift-and-shift” model for migrating existing approaches to cloud computing, instead this thesis presents a model that is entirely based on cloud computing. The following chapter underpins the design methodology followed in order to develop the CloudEx cloud-first framework and how this framework is then used to develop the ECARF RDF triple store.
Research Methodology
This chapter focuses on the research methodology followed in this research to answer the research questions summarised in Chapter 1. More specifically, the primary focus is on the Suggestion phase of the general methodology of design science research as outlined by [1] (Figure 1.2). The following sections outline the approach followed in order to produce novel designs for both the CloudEx framework and the ECARF triple store.
The rest of this chapter is organised as follows. Section 4.1 provides a brief overview of the methodology used in this research and how it was chosen. Subsequently, sec- tion 4.2 outlines the high level design methodology used in order to address the issues with distributed RDF processing. Then Section 4.3 introduces some of the key cloud computing features that influenced the model development stage. Subsequently, Sec- tion 4.4 summarises the design methodology used in this research in order to answer the research questions, followed by Section 4.5 which summarises the prototype and evaluation phases of the research. Finally, Section 4.6 concludes this chapter with a summary of the design methodology adopted.
4.1
Design Science Research Methodology
Generally research methodologies can be categorised into quantitative or qualitative, however, sometimes a combined approach is followed. Quantitative research is pri- marily concerned with numbers and aims to generate data that can be analysed to draw firm conclusions. Examples of quantitative research include experimental re- search, surveys and questionnaires, etc. On the other hand, qualitative research is designed to help researchers understand some aspect of social life. Examples of this type of research include action research, case studies, etc. In recent years the subject of design science research has gained numerous interest from the information systems community [126, 127]. Design science research changes the state-of-the-world through the introduction of novel artefacts [1].
The primary aim of this research is to develop and evaluate a cloud-based triple store for RDF processing, which requires an iterative process of proposing, designing and evaluating artefacts against related work. Additionally, related work [81, 128, 129] have successfully utilised similar techniques. Based on this, the design science research methodology was chosen for this research, which further facilitates comparison with the findings of related work. This methodology is outlined by Vaishnavi and Kuechler [1] and shown in Figure 4.1. The effort carried out in this research is mapped to this methodology in Figure 1.2. According to Vaishnavi and Kuechler, a typical research effort under this methodology proceeds as follows:
• Awareness of Problem. Awareness of an interesting problem in a reference discipline. The output of this phase is a research proposal.
• Suggestion. A creative process where new functionality is envisioned based on novel configurations of existing or new elements. The output of this phase is a tentative design.
Figure 4.1: The general methodology of design science research (Source: Vaishnavi and Kuechler [1])
• Development. The tentative design is further developed and implemented in this phase. The output of this phase is an artefact.
• Evaluation. The artefact is evaluated in this phase according to criteria made explicit in the proposal. The output of this phase is performance measures. • Conclusion. This is the finale of a specific research effort, the results are
considered good enough and are rewritten up.
The following sections focus on the Suggestion phase of this methodology and sum- marise the creative thought process used to envision new functionality.
Big Data Loading and Analysing Dictionary Encoding Transforming Querying (SPARQL) Rule-based Reasoning Managing Create, Update, Delete RDF Datasets Scope of Research Storage
Figure 4.2: Proposed RDF Triple Store Design