Workflow system for MapReduce in cloud environment

Full text

(1)UNIVERSITI PUTRA MALAYSIA WORKFLOW SYSTEM FOR MAPREDUCE IN CLOUD ENVIRONMENT. MUNTADHER SAADOON WADI. FSKTM 2017 6.

(2) PM U R. IG. H. T. WORKFLOW SYSTEM FOR MAPREDUCE IN CLOUD ENVIRONMENT. By. Thesis Submitted to Universiti Putra Malaysia, in Fulfilment of the Degree of Master Computer Science. ©. C. O. PY. MUNTADHER SAADOON WADI. July 2017 i.

(3) Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfillment of the requirements for Master of Computer Science. PM. WORKFLOW SYSTEM FOR MAPREDUCE IN CLOUD ENVIRONMENT. By. T. July 2017. U. MUNTADHER SAADOON WADI. H. ABSTRACT. IG. The magnitude of data generated and shared by businesses, public administrations, industrial sectors and scientific research, has increased immeasurably. Apache Hadoop is an open source software framework,. R. which enables a scalable and distributed processing of high volumes of. PY. data. MapReduce together with its Hadoop implementation has been widely adopted in many practical applications. A common practice nowadays is to implement MapReduce applications in a high-performance infrastructure, such as cloud computing. A cloud platform can deploy and. O. manage Hadoop clusters. However, there are tasks required advanced. C. knowledge in computer science and cloud computing when using MapReduce technology that prevent the usage of current technologies and. ©. software. solutions.. For. example,. MapReduce. deployment. and. maintenance, data integration with Hadoop distributed file system or MapReduce job submission. A MapReduce workflow system is one of the solution that could assist MapReduce and Hadoop developers. Besides, it provides a user-friendly execution platform that encapsulating complexity of data analysis steps. In this research, a new workflows system is ii.

(4) developed to facilitate the use of collaborating, coordinating and executing operations of MapReduce programs with a graphical user interface based on Hadoop cloud cluster. The experimental results indicate that the developed workflow system can achieve good speed in performance. It is. PM. believed that the workflow system is an ideal stereotype for MapReduce and it will play an important role in the era of big data applications in cloud. ©. C. O. PY. R. IG. H. T. U. computing.. iii.

(5) Abstrak tesis yang dikemukakan kepada Universiti Putra Malaysia sebagai memenuhi keperluan Ijazah Sarjana Sains Komputer. PERSEKITARAN CLOUD. U. By. PM. SISTEM ALIRAN KERJA BAGI MAPREDUCE DI DALAM. T. MUNTADHER SAADOON WADI. H. Julai 2017. IG. ABSTRAK. Magnitud data yang dijana dan dikongsi oleh sektor perniagaan,. R. pentadbiran awam, sektor industri dan penyelidikan saintifik, telah meningkat dengan ketara. Apache Hadoop adalah kerangka kerja perisian. PY. sumber terbuka, yang membolehkan pemprosesan boleh skala dan teragih bagi data berjumlah tinggi. MapReduce bersama dengan pelaksanaan. Hadoop telah digunakan secara meluas di dalam kebanyakkan aplikasi. O. praktikal. Amalan biasa pada masa kini adalah untuk melaksanakan aplikasi MapReduce di dalam infrastruktur berprestasi tinggi seperti. C. pengkomputeran awan. Pelantar awan boleh meletak atur dan mengurus gugusan. Hadoop.. Walau. bagaimanapun,. terdapat. tugasan. yang. ©. memerlukan pengetahuan lanjut di dalam bidang sains komputer dan pengkomputeran awan apabila menggunakan teknologi MapReduce yang boleh menghalang penggunaan teknologi dan penyelesaian perisian sedia ada. Sebagai contoh, peletak aturan dan penyelenggara MapReduce, integrasi data dengan sistem fail teragih Hadoop atau penyerahan kerja MapReduce. iv.

(6) Sistem aliran kerja MapReduce adalah salah satu kaedah penyelesaian yang dapat membantu pembangun MapReduce dan Hadoop. Selain itu, ianya menyediakan satu pelantar pelaksanaan yang mesra pengguna merangkumi langkah analisis data yang kompleks. Dalam kajian ini, satu untuk. kolaborasi,. koordinasi. dan. PM. sistem aliran kerja baru dibangunkan untuk memudahkan penggunaan pelaksanaan. operasi. program. MapReduce dengan antara muka pengguna grafik berasaskan Hadoop. gugusan awan. Hasil eksperimen menunjukkan bahawa sistem aliran kerja. U. yang dibangunkan dapat mencapai kelajuan yang baik di dalam aspek. prestasi. Adalah dipercayai bahawa sistem aliran kerja adalah stereotaip yang ideal untuk MapReduce dan ianya akan memainkan peranan penting. ©. C. O. PY. R. IG. H. T. di dalam era aplikasi data bersaiz besar dalam pengkomputeran awan.. v.

(7) ACKNOWLEDGEMENT. me with health, time and maturity of mind to study.. PM. First of all, all thanks and praise goes to God for His will and for bestowing. U. Million thanks to Novia Indriaty Admodisastro being my supervisor during master degree. She has provided assistance, guidance and constructive. H. T. comments during the process of completing this thesis.. I would like to infinitely thank and appreciate my father, Saadoon Wadi. IG. Najim, mother, Jinan Naaem, siblings and friends who always support, inspire, and encourage me to complete this project.. R. Special thanks to Lina, who gives me the power to always look forward. PY. and never give up.. Finally, to all UPM staff, thanks you for your facilitation. I would like to. O. acknowledge to any individual who are not mentioned here for his/her. ©. C. irreplaceable helps and cooperation.. vi.

(8) APPROVAL. Thesis submitted to the Senate of University Putra Malaysia and has been. PM. accepted as fulfillment of the requirement for Master of Computer Science. H. T. U. (Software Engineering).. IG. ______________________________ Supervisor,. R. Dr.Novia Indriaty Admodisastro. PY. Department of Software Engineering Faculty of Computer Science and Information Technology Universiti Putra Malaysia. ©. C. O. Date: 10/07/2017. vii.

(9) DECLARATION. I declare that the thesis is my original work except for the quotations and. PM. citations, which have been duly acknowledged. I also declare that it has. not been previously, and is not concurrently, submitted for any other. ______________________________ MUNTADHER SAADOON WADI Date. :. 10/07/2017. ©. C. O. PY. R. IG. H. T. U. degree at Universiti Putra Malaysia or at any other institution.. viii.

(10) TABLE OF CONTENT. ABSTRACT ........................................................................................................ ii. PM. ACKNOWLEDGEMENT ...................................................................................vi APPROVAL ...................................................................................................... vii. U. CHAPTER 1 ............................................................................................... 1 INTRODUCTION ........................................................................................ 1. Overview .................................................................................................... 1. 1.2. Problem Statement ..................................................................................... 3. 1.3. Research Objectives .................................................................................. 5. 1.4. Research Scope .......................................................................................... 5. 1.5. Thesis Organization ................................................................................... 7. R. IG. H. T. 1.1. PY. CHAPTER 2 ............................................................................................... 8 LITERATURE REVIEW .............................................................................. 8. O. 2.1 Introduction .................................................................................................. 8. ©. C. 2.2. Background of The Study .......................................................................... 8. 2.3. Workflow Systems for MapReduce ......................................................... 15. 2.4. Related Works.......................................................................................... 17. 2.4.1. Cloudgene ....................................................................................... 17. 2.4.2. CloudDOE ....................................................................................... 19. 2.4.3. IDEMP ............................................................................................. 20 ix.

(11) 2.4.4. J2M................................................................................................... 22. 2.4.5. Automatic Source Code Generation ........................................... 23. Discussion and Characteristics Analysis ................................................. 23. 2.5. Summary .................................................................................................. 26. PM. 2.5. CHAPTER 3 ............................................................................................. 28. U. METHODOLOGY ..................................................................................... 28 Introduction .............................................................................................. 28. 3.2. Research Methodology ............................................................................ 28. T. 3.1. Literature Review ........................................................................... 29. 3.2.2. Characteristics Analysis and Design .......................................... 30. 3.2.3. Development................................................................................... 30. 3.2.4. Evaluation ....................................................................................... 31. IG. R. Report Writing................................................................................. 32. PY. 3.2.5. H. 3.2.1. 3.3. Experiment Steps ..................................................................................... 32 Experiment Scoping ...................................................................... 32. 3.3.2. Experiment Objectives .................................................................. 33. 3.3.3. Experiment Planning ..................................................................... 33. 3.3.4. Experiments Execution ................................................................. 36. ©. C. O. 3.3.1. 3.4. Summary .................................................................................................. 38. CHAPTER 4 ............................................................................................. 39. x.

(12) Workflow System for MapReduce in Cloud Environment ......................... 39 Introduction .............................................................................................. 39. 4.2. The Solution............................................................................................. 39. 4.3. The Workflow System ............................................................................. 41. PM. 4.1. Use Case......................................................................................... 41. 4.3.2. Cloud Environment ........................................................................ 43. 4.3.3. Collaboration................................................................................... 44. 4.3.4. Reusability....................................................................................... 45. 4.3.5. Job Generation and Run-time interruption ................................ 46. 4.3.6. Design .............................................................................................. 48. T. H. IG. Summary .................................................................................................. 53. R. 4.4. U. 4.3.1. CHAPTER 5 ............................................................................................. 54. PY. RESULT AND DISCUSSION ................................................................... 54 Introduction .............................................................................................. 54. 5.2. Experimental Results ............................................................................... 54. 5.3.1. Experiment 1: The Workflow System vs. Native MapReduce 54. 5.3.2. Experiment 2: The Workflow System vs. IDEMP ..................... 56. ©. C. O. 5.1. 5.4 Discussion of the Result ............................................................................. 57 5.5. Summary .................................................................................................. 59. CHAPTER 6 ............................................................................................. 60. xi.

(13) CONCLUSION AND FUTURE RESEARCH ............................................ 60 6.1 Introduction ................................................................................................ 60 6.2 Summary of the Research ........................................................................... 60 Limitations of the Study .......................................................................... 61. 6.4. Recommendation for Further Research ................................................... 61. PM. 6.3. U. REFERENCES ......................................................................................... 62 APPENDICES .......................................................................................... 66. T. Appendix 1: Database server’s tables screenshots ............................................. 66. H. Appendix 2: The built-in class codes ................................................................. 68. IG. Appendix 3: Screenshot of a sample of the dataset used in experiment 1. ........ 69. ©. C. O. PY. R. Appendix 4: Screenshot of a sample of the dataset used in experiment 2. ........ 70. xii.

(14) LIST OF TABLES Table. Page. Table ‎2.1: Hadoop Core Components. ..................................................... 14. PM. Table 2.2: Comparison between the related works. ................................. 25 Table 2.3: Characteristics description of the workflow system ................. 26. U. Table ‎3.1: The hardware and software configuration for each node. employed for Experiment 1. ..................................................................... 35. T. Table ‎3.2: The hardware and software configuration for each node. H. employed for Experiment 2. ..................................................................... 36 Table ‎3.3: Experiment Procedure for Experiment 1, The workflow system. IG. vs. Native MapReduce. ............................................................................ 37 Table ‎3.4: Experiment Procedure for Experiment 2, The workflow system. R. vs. IDEMP. ............................................................................................... 38. PY. Table ‎5.1: The Result of Experiment 1 of the workflow system. ............... 55 Table ‎5.2: The Result of Experiment 1 of the native MapReduce job....... 55. O. Table ‎5.3: The Result of the Experiment 2 of workflow system. ............... 56. ©. C. Table ‎5.4: The Result of the Experiment 2 of IDEMP (Ma et al., 2016). ... 56. xiii.

(15) LIST OF FIGURES Figure ‎1.1: Research Scope ....................................................................... 6 Figure ‎2.1: MapReduce execution overview (Dean & Ghemawat, n.d.. PM. 2008) ........................................................................................................ 15 Figure ‎2.2: Comparison between identical approach and Cloudgene, for. setting up and executing MapReduce program (Schönherr et al., 2012) . 16. U. Figure ‎2.3: The use of Cloudgene in public or private clouds (Schönherr et. al., 2012) .................................................................................................. 18. T. Figure ‎2.4: Comparison between identical approach and CloudDOE, for. H. deploying Hadoop cluster and executing MapReduce program (Chung et. IG. al., 2014). ................................................................................................. 19 Figure ‎2.5: IDEMP system architecture (Ma et al., 2016) ......................... 21. R. Figure ‎2.6: J2M translator structure (Li et al., 2016) ................................. 22. PY. Figure ‎3.1: Flow of Research Methodology .............................................. 29 Figure ‎4.1: Architectural Diagram of the workflow system ........................ 40 Figure ‎4.2: Use case diagram illustrates the use of the workflow system. O. from two different perspectives................................................................. 42. C. Figure ‎4.3: Network topology diagram generated by OpenStack software to. ©. illustrate the overview environment entities and network. ........................ 43 Figure ‎4.4: Collaboration example provided by the workflow system. ...... 44 Figure ‎4.5: Comparison between the traditional approach of releasing a MapReduce job and the workflow system. ............................................... 46 Figure ‎4.6: The steps of compiling a certain MapReduce program. ......... 47 xiv.

(16) Figure ‎4.7: The architecture of MapReduce application interruption during its run-time................................................................................................ 48 Figure ‎4.8: Screenshot of the workflow system login page. ..................... 49. PM. Figure ‎4.9: Screenshot of the workflow system’s Project Leader section that shows the ability of the Project Leader to monitor and control. uncompleted MapReduce projects. .......................................................... 49. U. Figure ‎4.10: Screenshot of the workflow system’s Project Leader section that shows the ability of the Project Leader to monitor and control. T. completed MapReduce projects. .............................................................. 50 Figure ‎4.11: Screenshot of the workflow system’s Developer section that. H. shows the ability of creating MapReduce component and attach them to. IG. the available projects. ............................................................................... 50 Figure ‎4.12: Screenshot of essential MapReduce project’s configuration. R. set by the Project Leader. ........................................................................ 51 Figure ‎4.13: Screenshot of the page that allows developers to upload. PY. unlimited size of dataset to HDFS based on SSH protocol. ..................... 51 Figure ‎4.14: Screenshot of developer management page that allows. O. Project leader to manage and control the workflow system developers. .. 52. Figure ‎4.15: Screenshot of access denied page that secure the system. C. information from any unauthorized access. .............................................. 52. ©. Figure ‎5.1: Comparison between the workflow system and a native MapReduce job, shows the difference of the execution time in second. .. 58 Figure ‎5.2: Comparison between the workflow system and IDEMP, shows the difference of the execution time in second. ........................................ 58. xv.

(17) LIST OF ABBREVIATIONS Abbreviations. Meaning. UPM. Universiti Putra Malaysia. IT. Information Technology National Institute of Standards and Technology. OSI. Open Source Software. UML. Unified Modeling Language. PM. NIST. HyperText Markup Language. U. HTML CSS. Cascading Style Sheets. VM. Virtual Machine. Hadoop Distributed File System. MR. T. HDFS. ES2. Amazon Elastic Compute Cloud. SaaS. Software as a Service. H. Infrastructure as a Service Java Enterprise Edition. ©. C. O. PY. JavaEE. IG. IaaS. Platform as a Service. R. PaaS. MapReduce. xvi.

(18) CHAPTER 1 INTRODUCTION. Overview. PM. 1.1. According to Hashem et al., (2015) the constant rise in the amount and detail of data occupied by organizations, such as the increase of social. U. media, Internet of Things (IoT), and multimedia, has produced an unusual. flow of data formats structured, semi-structured and unstructured. Big data. T. cites awareness from the academia, government, and enterprises. Big data. H. are categorized by three characters: (1) data are infinite, (2) data cannot. IG. be classified into traditional relational databases, and (3) data are produced, grown, and processed rapidly. Besides, big data is transforming. R. health-care, physics, manufacturing, economics, marketing, and finally, the. PY. society. The velocity at which new data are being produced is tremendous. A major challenge for researchers and practitioners is that this increased rate eclipses their ability and knowledge to design proper cloud computing. O. platforms for data analysis and update-intensive workloads (Hashem et al.. C. 2015).. ©. Cloud computing is one of the hugely significant turns in latest Information and communications technology and service for enterprise software. In addition, it is a robust architecture to achieve large-scale and heterogeneous computing. It has been demonstrated quite successful in big data applications. The major step is to drive jobs from traditional. 1.

(19) servers to cloud servers. Cloud servers provide higher services to users rather than a service that contributed by personal machines or small servers. The two significant benefits of cloud computing are scalability and. PM. efficiency (Ma et al. 2016). MapReduce is the most common parallel programming model to analysis vast amount of data on Hadoop clusters in cloud platforms. MapReduce is the native programming language of. U. several tools that have been developed on the top of Hadoop ecosystem. such as Apache Pig (Olston et al. 2008), Apache Hive (Thusoo et al.. T. 2009), and code translation applications (Li et al. 2016; Zhang et al. 2013).. H. These tools have significantly improved the productivity of writing. IG. MapReduce programs. However, in practice, auto-generated MapReduce programs have to be observed for many queries that are often extremely. R. inefficient compared to native MapReduce programs by experienced programmers (Olston et al. 2008). In addition, executing MapReduce. PY. programs on cluster architecture poses considerable challenges for IT research laboratories that interesting in using MapReduce programming. O. model (Chung et al. 2014; Ko, Park, and Version 2017; Schönherr et al.. C. 2012) . In order to address this problem, several recent workflow systems. have emerged specifically for MapReduce programming model to facilitate. ©. the use of MapReduce in cloud platform, for example, Cloudgene. (Schönherr et al. 2012), CloudDOE (Chung et al. 2014) and cl-dash (Hodor et al. 2016).. 2.

(20) In this research, we intend to design and develop a workflow system for MapReduce program in Hadoop cloud cluster. The main objective of the workflow system is to facilitate the use of collaborating, coordinating and. PM. executing operations of MapReduce programs with a graphical user interface based on Hadoop cloud cluster.. Problem Statement. U. 1.2. The magnitude of data generated and shared by businesses, public. T. administrations numerous industrial and not-to-profit sectors, and scientific. H. research, has increased immeasurably (Sivarajah et al. 2016). Apache. IG. Hadoop is an open source software framework, which enables a scalable and distributed processing of high volumes of data (Vavilapalli et al. 2013).. R. MapReduce together with its open-source implementation Hadoop has. PY. been widely adopted in many practical data processing applications (Dean & Ghemawat, n.d. 2008). A common practice nowadays is to process these data in a high-performance computing infrastructure, such as cloud. O. (Hashem et al. 2015). Cloud platform has the ability to deploy and manage. C. Hadoop clusters (Thaha et al. 2016). However, there are four major. ©. problems when emerging these fundamental technologies, which are listed below. First, due to the tremendous volume of data sets being collected and analyzed daily, companies scale up hundreds of systems into their data centers to handle this growth of data (Computing et al. 2016). In addition, as. 3.

(21) long as data centers have increased numbers of systems, multiple configurations and maintenance requirements have to be increased concurrently. Resulting, traditional solutions for collecting and analyzing. PM. data are prohibitively expensive (Gates et al. 2009). Second, Hadoop is an open source software framework which implements MapReduce model for controlling big data operations. It provides a Hadoop. U. Distributed File System (HDFS) as the global file system running on a cluster (Shvachko 2010). According to Ma et al., 2016, setting up Hadoop. T. cluster in a distributed environment is not an easy task. Developers have. H. challenges in installing and configuring Hadoop cluster because of Hadoop. deployment.. IG. complexity that leads them to spend more time and effort on Hadoop. R. Third, MapReduce is the best option being used as a parallel processing system framework. However, MapReduce is very low level and rigid.. PY. Besides, it requires developers to write custom codes that are complex to be reused and maintained (Li et al. 2016; Ma et al. 2016; Olston et al. 2008;. O. Thusoo et al. 2009).. C. Fourth, MapReduce offers a simple dataflow programming model that appeals to many users especially who is familiar with Java programming. ©. language. Nevertheless, in MapReduce codes implementation, the complete simplicity of the Map-Reduce programming model drives to 1) MapReduce job optimization has to be manually manipulated by its developers. 2) Map-Reduce also lacks explicit support for combined. 4.

(22) processing of multiple data sets. 3) Frequently-needed data manipulation primitives like filtering, aggregation, and top-k thresholding, must be coded by hand (Gates et al. 2009).. PM. As known, workflow systems can streamline the design and execution of workflows in high-performance computing settings such as local or distributed cloud clusters (Spjuth et al., 2015). Besides, these are many. U. workflow systems and software tools attempted to solve this problem. However, there was a lack in performance and some features (e.g.. T. collaboration, reusability, and run time interruption) were not fully realized. H. in their current versions (Li et al. 2016; Ma et al. 2016; Schönherr et al.. Research Objectives. R. 1.3. IG. 2012).. PY. The objectives of this study are listed as below: . To derive the characteristics of a workflow system for MapReduce programming model.. ©. C. O. . 1.4. . To design and develop a workflow system for MapReduce in cloud environment. To evaluate the performance of the workflow system.. Research Scope The research focuses on MapReduce programming model and big data analysis based on Hadoop cloud cluster. The cloud environment used. 5.

(23) to carry out the proposed system is OpenStack cloud software (Kumar and Parashar 2014). In addition, OpenStack will handle both big-data resources (e.g. Hadoop nodes) and the workflow system. The workflow. PM. system will be designed and developed as web-based application using Java EE. The following Figure 1.1 shows the major tools that will be. ©. C. O. PY. R. IG. H. T. U. used in this research.. Figure 1.1: Research Scope. 6.

(24) 1.5. Thesis Organization. This thesis is divided into six chapters. The following paragraphs provide a brief description of the remaining chapters of this thesis: Chapter 2. PM. discusses workflow systems and code generation tools based on Hadoop that are used for MapReduce model in order to simplify the use of complex. workloads. Chapter 3 introduces different phases of the research. U. methodology of this research. Chapter 4 describes the design,. implementation and proves the concepts of the proposed workflow system.. T. Chapter 5 demonstrates the performance of the developed workflow. H. system. Chapters 6 present the conclusion together with thesis limitation. ©. C. O. PY. R. IG. and identify some areas for future work.. 7.

(25) REFERENCES. Anon. n.d. OpenStack Open Source Cloud Computing Software. Retrieved. PM. November 3, 2016 (http://www.openstack.org).. Chung W-C, Chen C-C, Ho J-M, Lin C-Y, Hsu W-L, Wang Y-C, et al.. (2014) CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds. U. and Analyzing High-Throughput Sequencing Data with MapReduce. PLoS ONE 9(6): e98146.. T. https://doi.org/10.1371/journal.pone.0098146. Computing, High Performance et al. 2016. “BIG DATA DEVELOPMENT. H. PLATFORM FOR ENGINEERING.” IEEE International Conference on. IG. Big Data (Big Data) 2699–2702.. Dean, Jeffrey and Sanjay Ghemawat. n.d. “MapReduce: Simplified Data Processing on Large Clusters". Communications of the ACM Volume. R. 2008 107-113. PY. Deelman, Ewa, Dennis Gannon, Matthew Shields, and Ian Taylor. 2009. “Workflows and E-Science : An Overview of Workflow System Features and Capabilities.” Future Generation Computer Systems. O. 25(5):528–40. Retrieved. C. (http://dx.doi.org/10.1016/j.future.2008.06.012).. ©. Ferreira, Rafael et al. 2017. “A Characterization of Workflow Management Systems for Extreme-Scale Applications.” Future Generation Computer Systems. Retrieved (http://dx.doi.org/10.1016/j.future.2017.02.026). Gates, Alan F. et al. 2009. “Building a High-Level Dataflow System on Top of Map-Reduce: The Pig Experience.” Vldb ’09 1–12. Retrieved. 62.

(26) (papers3://publication/uuid/08DB3582-AD5E-47BB-A370589985B45F92). Hashem, Ibrahim Abaker Targio et al. 2015. “The Rise Of ‘big Data’ on Systems 47:98–115. Retrieved (http://dx.doi.org/10.1016/j.is.2014.07.006).. PM. Cloud Computing: Review and Open Research Issues.” Information. Hodor, Paul, Amandeep Chawla, Andrew Clark, Lauren Neal, and Booz. U. Allen Hamilton. 2016. “Sequence Analysis Cl-Dash : Rapid. Configuration and Deployment of Hadoop Clusters for Bioinformatics. T. Research in the Cloud.” Bioinformatics 32(October 2015):301–3.. H. Kaur, Simranjit and Sumesh Sood. 2016. “A Survey Paper on the Evaluation Criteria of Open Source Cloud Computing Solutions.”. IG. International Journal of Computer Science and Mobile Applications,Vol.4 Issue. 6, June- 2016, pg. 6-12 4:6–12.. R. Kiran, Mariam, Peter Murphy, Inder Monga, Jon Dugan, and Sartaj Singh Baveja. 2015. “Lambda Architecture for Cost-Effective Batch and. PY. Speed Big Data Processing.” Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 2785–92.. Ko, Mi-eun, Young B. Park, and A.Apache Hadoop Version. 2017.. O. “Automatic Generation of UML Model-Based Image Processing. C. Source Code in Hadoop Platform.” International Conference on Platform Technology and Service 2–5.. ©. Kumar, Rakesh and Bhanu Bhushan Parashar. 2014. “Dynamic Resource Allocation and Management Using OpenStack.” cnsm (November):1– 5. Li, Bing, Junbo Zhang, Ning Yu, and Yi Pan. 2016. “J2M: A Java to MapReduce Translator for Cloud Computing.” Journal of 63.

(27) Supercomputing 72(5):1–18. Ma, Zhiqiang, Shuangtao Yang, Zhida Shi, and Rui Yan. 2016. “Online Integrated Development Environment for MapReduce Programming.” International Journal of u- and e- Service, Science and Technology. PM. 9(6):399–408.. Mandal, Bichitra, Ramesh Kumar Sahoo, and Srinivas Sethi. 2015. “Architecture of Efficien T Word Processing Using Hadoop. Man and Machine Interfacing (MAMI).. U. MapReduce for Big Data Applications.” International Conference on. T. Olston, Christopher, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and. H. Andrew Tomkins. 2008. “Pig Latin.” Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data - SIGMOD. IG. ’08 1099. Retrieved. (http://infolab.stanford.edu/~olston/publications/sigmod08.pdf%5Cnhtt. R. p://portal.acm.org/citation.cfm?doid=1376616.1376726). Peng, Junjie et al. 2010. “Comparison of Several Cloud Computing. PY. Platforms.” 2nd International Symposium on Information Science and Engineering, ISISE 2009 23–27.. Schönherr, Sebastian et al. 2012. “Cloudgene : A Graphical Execution. O. Platform for MapReduce Programs on Private and Public Clouds.”. C. BMC Bioinformatics 1–9.. ©. Shvachko, K. 2010. “The Hadoop Distributed File System.” IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) 1– 10. Shvachko, Konstantin, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. “The Hadoop Distributed File System.” 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, 64.

(28) MSST2010 1–10. Sidhu, Ravneet Kaur and Charanjiv Singh Saroa. 2016. “Efficient Batch Processing of Related Big Data Tasks Using Persistent MapReduce Technique.” Proceedings of the Third International Symposium on. PM. Computer Vision and the Internet 106–9. Retrieved (http://doi.acm.org/10.1145/2983402.2983431).. Sivarajah, Uthayasankar, Muhammad Mustafa Kamal, Zahir Irani, and. U. Vishanth Weerakkody. 2016. “Critical Analysis of Big Data Challenges and Analytical Methods.” Journal of Business Research 70:263–86.. T. Retrieved (http://dx.doi.org/10.1016/j.jbusres.2016.08.001).. H. Spjuth, Ola et al. 2015. “Experiences with Workflows for Automating DataIntensive Bioinformatics.” Bioinformatics 1–12.. IG. Thaha, Asmath Fahad, Anang Hudaya, Muhamad Amin, Subarmaniam Kannan, and Nazrul Muhaimin Ahmad. 2016. “Data Location Aware. R. Scheduling for Virtual Hadoop Cluster Deployment on Private Cloud. PY. Computing Environment.” 103–9. Thusoo, Ashish et al. 2009. “Hive - A Warehousing Solution Over a MapReduce Framework.” Sort 2:1626–29. Retrieved. O. (http://portal.acm.org/citation.cfm?id=1687609).. C. Vavilapalli, V. K. et al. 2013. “Apache Hadoop Yarn: Yet Another Resource Negotiator.” Annual Symposium on Cloud Computing 5.. ©. Zhang, Junbo, Dong Xiang, Tianrui Li, and Yi Pan. 2013. “M2M : A Simple Matlab-to-MapReduce Translator for Cloud Computing.” Tsinghua Science and Technology 18(1).. 65.

(29)

No results found