http://www.cisjournal.org
Exploration of the Significance of Cloud Computing in Managing Big Data
1 Hassan Alghamdi, 2 Lily Sun
1 College of Computer Science and Information Technology, Al Baha University,K.S.A
2 School of Systems Engineering, Reading University, U.K
ABSTRACT
In the present business environment, clusters of servers are required to support the tools that process the varied formats, high velocity, and large volumes of big data. For big data management, IT companies are increasingly looking to implement the cloud computing as the basic infrastructure. In this context, this paper investigates the significance of cloud computing in the management of big data. The key aspects related to the management of big data with the application of cloud computing covered in this paper include the integration of cloud computing and big data, the issues faced while managing big data, the role of cloud computing in handling issues associated with big data, along with few examples of the application of cloud computing for managing big data.
Keywords: Big Data, Cloud Computing, Big data management
1. INTRODUCTION
In the modern cyber age, computers and the internet have reached to every household, promoting the use of digital storage. With the digitalization and advancement in technologies, the utilization of digital storage has exponentially increased with the use of digital and web storage [5]. This advancement and modernization have brought the concept of cloud computing into existence. Cloud computing is the abstraction of internet- based resources, computers and services that are utilized by the system developers to implement complicated web- based systems[6].
Cloud computing is simply a way by which the digital storage, computational power, collaboration infrastructure, business applications and processes are provided as a utility, a service or a group of services that are meant to meet the business demands[6]. The data generated and stored on the internet is increasing every day, creating difficulties for the conventional storage and database management systems. The sets of data, with sizes ranging from a few terabytes to many petabytes, are increasing constantly beyond the ability of the traditional software tools that are used to minister, acquire, manage, and process data[7]. In the context of big data, the size of the data is a perpetually changing point of reference.
Recently,Accenture [8] states that lacking of data is no longer an issue in many companies. However, suffering from the lack of the right data is the serious concern. Therefore, right big data is needed in companies in order to effectively define the strategic direction.
The set of technologies and techniques that involve the new form of integration to reveal large datasets are much complex, diverse and of a huge scale.
Big data refers to “the tools, processes, and
manipulate, and manage very large data sets and storage facilities” [9, p. 18]. With the combined utilization of cloud computing and big data technologies, the end-user satisfaction can be improved and the maximum use of the user data can be achieved along with minimizing the cost.
Cloud computing and the big data are interrelated as cloud computing plays a significant role in handling big data by the use of various characteristics underlying the concept of cloud computing, such as cost optimization, security and control over the data by distributed architectures.
For instance, Facebook has established its new data centre in Oregon. This data centre was having total size of 147,000 square feet. In this way, for managing big data there is a need of large space, which is not feasible for small companies[10]. They are required to make heavy investments in the existing systems. It becomes quite essential for companies to have virtual storage of this large amount of data that brought big data into existence. In this regard, big data implemented using cloud computing can provide an effective solution to this problem as the company stores its data on virtual storage, saving the company a heavy investment.
2. ISSUES IN HANDLING BIG DATA Big data refers to a few particular characteristics associated with data in the context of scale and analysis[7]. It is a collection of a huge amount of data that is unstructured and does not have any specific formatting.
Due to this complexity, some issues are faced while managing big data in companies, which are discussed under following points:
2.1 Data Integration
It is highly important that the outcomes of all the individual data sets of big data are considered
http://www.cisjournal.org
such as data redundancy and integrating data sets with different data patterns [11].
2.2 Lack of Skilled Workforce
The absence of a skilled and proficient analyst, who has an in-depth knowledge regarding the concepts of big data and its assessment, can also, make it difficult for the companies to fully utilize the benefits associated with management of big data. Business experts have stated that even though employees are aware of the importance of managing big data, they are unaware of the related challenges and opportunities. Therefore, due to the absence of proper skills and expertise in managing big data, companies face problems, such as data loss, data mismatch, data redundancy and security issues[7].
2.3 Absence of a Query Language
When an organization switches to big data, it has to acknowledge the fact that it cannot use the Structured Query Language (SQL) for managing data and running queries. Therefore, it has to first examine that, if the purpose of data assessment is solved with the help of a query language, there is no need for switching to applications that manage big data. This analysis is necessary because the management of big data is a cumbersome task as it consists of both, structured and unstructured data. Therefore, a query language, which would give fast results and is considered as the first choice for data assessment, cannot be considered in case of big data due to data inconsistency [11].
2.4 Identification of the Data Set
The companies that are required to manage big data, sometimes, do not consider the comprehensive application of the related software and frameworks, and implement them without the identification of the correct data set [12]. For instance, for merely converting the raw data, extracted from the web, into structured data, companies unnecessarily use a few frameworks, thereby incurring a high cost. Therefore, identifying a particular data set is a difficult task, due to the presence of inconsistent data, as each data set must contain consistent data for data assessment [11].
2.5 Pattern Validity
IT experts are required to put a lot of efforts while forming data patterns for big data[13]. While data assessment, if these patterns are found to be invalid, then all the processes, involving the data collection, storage and analysis have to be carried out again. This result is obtained in the end and leads to a significant loss of crucial resources, such as money, efforts and time if carried out incorrectly[14].
2.6 Proactive Approach
For handling the massive data, it is required that companies need to carry out their long-term planning
process in advance. Moreover, the companies are required to understand the usage, nature and quantum of the data, so that they do not face any difficulty in handling a large amount of data with the help of various frameworks and software. Companies face difficulties when they are unable to develop and store the data, once assessed, in such a manner that it can be used for future projects, thereby saving time, money and efforts[11].
2.7 Data Velocity
The data velocity refers to the speed with which data is created, streamed and segregated [6]. In this context, problems occur when the data is humungous and is neither structured nor formatted. In such a case, the management of data velocity becomes a time-consuming task due to the lack of proper applications and software.
2.8 Data Crashing
Management of big data can be considered as one of the major issues faced by the business organizations which are having large servers and requirements of transfer of data in a large amount. In this regard, crashing of data while being transferred from one node to the other can be project as the common issues faced by organization. The transfer of large data which has large bit sizes and require huge disk spaces, the chances of lost of data packets becomes quite prominent in the existing time period. In addition to this, from the perspective of an organization, which requires transfer of highly authentic and confidential data, big data measure can always create issues [11].
All the above-mentioned issues raise obstacles to the effective management of big data, making the data assessment a difficult task. Moreover, with the growing popularity of social media and the ever-increasing membership of people on the social media platforms, there is a need to properly store and manage the big or massive data so that data retrieval and assessment can be done in an effective manner[7]. In this context, cloud computing is one of the most significant approaches that can be employed to ensure the proper management of big data.
3. INTEGRATION OF CLOUD
COMPUTING IN MANAGING BIG DATA
Cloud computing and big data are two new technologies that are experiencing increased popularity and the combination of these technologies have been proving to be much powerful if used for the storage and analytical purposes. After the realization of the importance of big data, Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) have been rapidly accepted.
http://www.cisjournal.org
Figure 1:Main aspects forming a cloud systems [4]
Figure 1 explains the main aspects associated with the Cloud Computing System. The types of cloud computing infrastructure such as IaaS, SaaS and PaaS, which have different features like elasticity, reliability and visualization that depends on the usage of the cloud computing. Apart from these, the models of cloud computing, stakeholders for different cloud computing and localities of usage of cloud computing have been shown. Depending on the application of the cloud computing, the features, mode and type of cloud computing to be used is determined [13].
Big data has five basic characteristics, volume, value, variety, velocity and veracity, also known as the five Vs of massive data, among which the major three are the volume, variety, and velocity that are shown in Figure 2[9].
For the extraction of the required value from big data, various data services are considered; the focus for valid information extraction from the data should be on analytics. The analytics must also be provided as a service, which is supported by cloud computing models, such as the internal private cloud, a public cloud and the hybrid cloud[7]. With the help of the cloud computing, standard analytical solutions can handle the big data;
along with this, cloud computing also imparts efficiency and flexibility that are essential for assessing data. The cloud infrastructure changes with specific requirements of the organizations on the factors of cost, security, data interoperability and scalability. Cloud infrastructures, such as the private cloud infrastructure, are applied to mitigate the risk. To increase the control over the data and information, the public cloud infrastructure improves the scalability while the hybrid cloud infrastructure may use certain services, resources and characteristics of public cloud and some characteristics of the private cloud. The analysis of big data, using the cloud based strategies, help in the optimization of the cost[9].
One of the important components in handling the velocity, volume, veracity and variety of big data is the underlying infrastructure. The inherited infrastructure for storing big data is used by various business organizations, which are not capable of handling several real-time operations. For handling the issues faced by the organizations using legacy infrastructure, the Software-as- a-Service type of cloud computing is used to optimize the existing infrastructures, which are accessible through the internet[9].
With the use of these cloud solutions, business organizations can collect and store their data using remote services, without worrying about the overloading of their present infrastructure. The data which is associated with the social networking sites and mobile applications like Facebook, YouTube, Whatsapp and Twitter is classified under big data; the data on these websites and mobile applications increases with the amount of user membership.
Big data has an impact on the existing infrastructures, which is initiating the development of diverse infrastructure strategies in the network and other segments of the data center; these diverse infrastructure strategies are also known as cloud computing [7]. Cloud computing has emerged as a feasible and mainstream solution for data processing, storage and distribution.
Transferring the big data from the cloud is a challenge for the organizations, having terabytes or petabytes of digital content[9].
Figure 2:The major 3Vs of Big Data [3]
http://www.cisjournal.org
4. ROLE OF CLOUD COMPUTING IN HANDLING ISSUES ASSOCIATED WITH BIG DATA
When organizations undertake big data projects, all the processes, from the retrieval of the humungous data from the web to its assessment, are required to be accomplished [9]. In addition to this, organizations also require a high data storage capacity and computing power.
On the other hand, various stakeholders expect quick results at a low cost, along with dependable project outcomes[7]. Therefore, it is difficult to fulfill all these needs by employing traditional methods and tools. In this context, cloud computing proves to be highly advantageous by providing a large data storage at low costs.
In the past, companies had to arrange and store their data and other resources at various data centers.
Moreover, for storing a large amount of data, they were required to buy the hardware, which added to the overall cost of the project[15]. With the advent of cloud computing, companies are provided with the storage size and computing power of any range, depending on their needs and purpose. Hadoop is well-structured software that is used by companies for handling big data projects as it is a convenient way of data processing in the context of a large amount of unstructured data[16].
Figure 3 presents both the big data and deep analysis which shows the flow of control and data within the data handling and data analysis parts of the framework. In the diagram, the big data processing part that is implemented by PIG/Hadoop/Hive technology through the classical ETL logic is depicted on the left side. After investigating the Map that Hadoop provides, the processing can be linearly scaled up by adding more machines into the Hadoop cluster. The most common approach for completing these kinds of operations is by incorporating the cloud computing resources (e.g.
Amazon EMR).
The deep analysis part, on the right side of the
that it can accurately fit the machine's capacity. The deep analysis part involves data preparation, data visualization, model learning (e.g. Linear Regression and Regularization, Decision Tree, K-nearest- neighbor/Bayesian network/Support vector machine/Neural Network and Ensemble methods) and the model evaluation[14].
In this respect, the example of Amazon’s S3 can be considered, where the massive data is either collected or transferred directly to the cloud data sink. Qubole, which provides cloud-based services, in case of big data assessment, is used by various companies for instantly unlocking the data with the help of database adapters [7].
Hadoop clusters are provided by most of the providers of cloud-based services according to the need for data storage. With the help of cloud computing, any organization can assess and store a large amount of unstructured data, which proves to be scalable and cost- effective[9]. A cloud infrastructure can help various companies in gaining data insights with the help of big data analytics.
A hybrid cloud infrastructure consists of various resources and data that are provided both, internally and externally. In this context, for the storage and analysis of unstructured and massive data, the hybrid cloud infrastructure proves to be the most suitable[7]. Moreover, a cloud database facilitates the deployment of a large number of virtual servers seamlessly, smoothly and in a short time period[9]. The big data ecosystem has been shown in Figure 4 given below that shows the big data along with its features, like storage and search, and methods of handling big data such as Hadoop, New SQL, No SQL and operational RDBMS database (Marz &
Warren, 2015).
A big data environment enables the organization to store, process, analyze and visualize the data. This starts with the development of an infrastructure and after that, the selection of appropriate tools that can store, process and analyze data. Specialized analytics tools are then used that help the organization in finding the insights Figure 4:Big Data Processing [1]
Figure 3:Big Data Ecosystem [2]
http://www.cisjournal.org
These all collectively formulate the big data ecosystem[11].
Due to the cost-effectiveness of cloud computing, the big data and resources are made available to both, small and large enterprises, for their business processes[17]. The cloud environment allows companies to use the resources and data as per their demand, which helps in scaling up and down the storage space and processing power. With the increasing popularity of social media platforms, the data is increasing at a phenomenal rate, which is resulting in the collection of the bulk of structured and unstructured data [18].
This data cannot be stored in normal relational databases as managing and working with such an amorphous data requires a lot of time and efforts. In such a case, cloud computing provides data storage and computing facility, so that this data can be converted into useful information without the loss of any significant data [11].
There are various security requirements and frameworks for regulatory compliance that are provided by the cloud-based service providers as inbuilt features.
Therefore, the data stored on the cloud is available only on demand, and the users are provided with password protected credentials to ensure authenticity. Various acts and regulations, such as the Health Insurance Portability and Accountability Act and the Payment Card Industry Data Security Standard, help the data users to ensure security from the service providers and hackers[19].
Another major driver that promotes the usage of cloud services for assessing big data is its accessibility, irrespective of place and time.
5. EXAMPLES OF THE MANAGEMENT OF BIG DATA THROUGH CLOUD COMPUTING
Various business organizations and companies have adopted the newly emerged concept of cloud computing for gathering, storing, and controlling the digital content sizing from few terabytes and ranging up to several pet bytes, big data. Companies such as Google, Apple and Drop box, have generated their own cloud computing software and provide its users with the specific amount of data storage over the internet. Apart from the web data storage, some of the companies, social networking websites and mobile applications also use the cloud computing for the storage of data that is rapidly increasing[20]. Facebook recently revealed that the engineers working with the company have implemented the cloud computing infrastructures for fast data management. Google Inc. has developed Google Drive software to enable its users for web storage and it has also developed various other software that work on the cloud computing principles to handle big data. As the amount of unstructured data increases from social media, extraction from big data increases in worth, when integrated data sets are incorporated and analyzed to increase the level of competitive advantage[21].
Figure 5:Big Data Analytic Program [1]
Error! Reference source not found. illustrates
the big data analytic program that shows the flow of big various outputs. The OLAP (On Line Analytic Processing) includes data mart/cubes, and reports are
http://www.cisjournal.org
with the use of public clouds for managing the data retrieved from the social media platforms such as Facebook, Twitter and Pinterest, and other information such as weather data, financial markets data and collective industry-specific data is stored in the cloud and the data control becomes more cost-effective for the cloud and the organization.
Some of the companies that have implemented cloud computing and have shown the importance of the cloud framework for handling the big data have been discussed further:
• Netflix is a company, which streams online movies and has made a comeback with the use of cloud computing for its movie streaming after the company experienced a downfall.
• Xerox, a company well known for its paper- photocopy machines, has now introduced its Cloud Print Solution, which allows the users to print from anywhere [22].
• Pinterest uses the cloud services to adjust the traffic levels on the websites and to conduct experiments while employing a small team.
Pinterest has started big data as a service to store and analyze its data.
• Instagram has also started the use of cloud computing to handle the growth and increase the scalability of the company. Instagram was first launched in 2010 on a consumer PC, in L.A, and in a couple of hours, the server got exhausted and Instagram data had to be transferred to the cloud.
• Apple has always been the leader in the field of technology with its cutting edge technologies and launched its iCloud and Siri, which are based on the cloud computing technology [23].
6. CONCLUSION
Cloud computing has brought a revolution in various companies, irrespective of their industry and size.
Big data is the collection of massive data that can be either structured or unstructured. On the other hand, cloud computing is the technology that provides virtual servers and storage space for data allocation and sharing.
In the present dynamic environment and the ever growing popularity of social media, the collection of humongous and unstructured data has been observed. A combination of cloud computing and big data has provided the companies with a highly efficient tool to store and manage data in a cost-effective manner. In this context, it has been examined that the application of cloud computing for managing big data proves to be a beneficial approach as it easily prevents the issues associated with big data.
It is necessary that companies first examine their need for incorporating the technology of cloud computing, evaluating the amount of data they are dealing with.
skilled workforce, which are required to be prevented to avoid obstructions in managing big data.
However, companies and users can store and access the data stored on the cloud from any location and at any time. In addition to this, the benefits of high computing power and storage space are associated with cloud-based services, making the process of the assessment of the massive data easier. Cloud computing helps various companies in carrying out big data projects, involving all the stages, from data extraction to data assessment. Therefore, it can be concluded that in the contemporary scenario, the technology of cloud computing is highly effective in managing the ever- growing big data.
REFERENCES
[1] R. Ho. (2012, 12 July 2015). BIG Data Analytics
Pipeline. Available:
http://horicky.blogspot.fr/2012/08/big-data- analytics.html
[2] Yellowfin. (2012, 17 July 2015). Big Data and BI
Best Practices Available:
http://www.slideshare.net/glenrabie/big-data-and- bi-best-practices
[3] J. HURT. (2012, 15 July 2015). The Three Vs Of Big Data As Applied To Conferences. Available:
http://velvetchainsaw.com/2012/07/20/three-vs-of- big-data-as-applied-conferences/
[4] L. Schubert, K. G. Jeffery, and B. Neidecker-Lutz, The Future of Cloud Computing: Opportunities for European Cloud Computing Beyond 2010:--expert Group Report: European Commission, Information Society and Media, 2010.
[5] S. Pasalapudi, "Trends in cloud computing: big data’s new home," White paper, Oracle profit, 2014.
[6] K. Jamsa, Cloud Computing: SaaS, PaaS, IaaS, Virtualization, Business Models, Mobile, Security and More: Jones & Bartlett Publishers, 2011.
[7] R. Hill, L. Hirsch, P. Lake, and S. Moshiri, Guide to cloud computing: principles and practice:
Springer Science & Business Media, 2012.
[8] Accenture. (2015, 21 June 2015). Accenture Technology Vision 2015, Digital Business Era:
Stretch Your Boundaries. Available:
http://techtrends.accenture.com/us-
en/downloads/Accenture_Technology_Vision_201 5.pdf
[9] A.-E. Hassanien, A. T. Azar, V. Snasel, J.
Kacprzyk, and J. H. Abawajy, Big Data in
http://www.cisjournal.org
[10] D. C. Knowledge. (2015, 2 July 2015). The Facebook Data Center FAQ. Available:
http://www.datacenterknowledge.com/the- facebook-data-center-faq/
[11] M. Chen, S. Mao, Y. Zhang, and V. C. Leung, Big data: related technologies, challenges and future prospects: Springer, 2014.
[12] C. Ji, Y. Li, W. Qiu, U. Awada, and K. Li, "Big data processing in cloud computing environments,"
in Pervasive Systems, Algorithms and Networks (ISPAN), 2012 12th International Symposium on, 2012, pp. 17-23.
[13] D. K. Barry, "Web Services, Service-Oriented Architectures, and Cloud Computing: The Savvy Manager’s Guide (The Savvy Manager’s Guides),"
ed: Morgan Kaufmann, 2012.
[14] N. Marz and J. Warren, Big Data: Principles and best practices of scalable realtime data systems:
Manning Publications Co., 2015.
[15] M. Minelli, M. Chambers, and A. Dhiraj, Big data, big analytics: emerging business intelligence and analytic trends for today's businesses: John Wiley
& Sons, 2012.
[16] T. White, Hadoop: The definitive guide: " O'Reilly Media, Inc.", 2012.
[17] C. S. Alliance, "International Symposium on Pervasive Systems, Algorithms and Networks,"
2012.
[18] C. Catlett, Cloud Computing and Big Data vol. 23:
IOS Press, 2013.
[19] N. Andrews and B. Neil Andrews, Contract law:
Cambridge University Press, 2015.
[20] B. Nedelcu, "About Big Data and its Challenges and Benefits in Manufacturing," Database Systems Journal, vol. 4, pp. 10-19, 2013.
[21] G. John, K. Deanna, and B. Kristi, Eds., Strategic Data-Based Wisdom in the Big Data Era. Hershey, PA, USA: IGI Global, 2015, p.^pp. Pages.
[22] T. H. Davenport and J. Dyché, "Big data in big companies," May 2013, 2013.
[23] T. Erl, R. Puttini, and Z. Mahmood, Cloud Computing: Concepts, Technology, &
Architecture: Pearson Education, 2013.