Hadoop based Image Transcoding in Paas Platform of Cloud Environment

(1)

International Journal of Research in Information Technology (IJRIT)

www.ijrit.com ISSN 2001-5569

Hadoop based Image Transcoding in Paas Platform of Cloud Environment

Prof. Deepak Gupta¹

Dept. Of Computer Engg, Savitribai Phule Pune University Siddhant Collegeof Engineering

Pune, India

Ms.Vaishali Patil ²

Dept. Of Computer Engg, Savitribai Phule Pune University Siddhant College of Engineering

Pune,India

[email protected]

Abstract—In these days of rapid technological change, the growing number of people using social networking services (SNSs) have facilitated the sharing of multimedia data and internet. However, multimedia data processing techniques impose a burden on the computing infrastructure as the amount of data increases. Therefore, this paper proposes a MapReduce-based image Transcoding module in cloud computing environment in order to reduce the burden of computing power. The proposed module having two parts:

Map Reduce Framework running on Hadoop distributed file system (HDFS) and media processing libraries Java Advanced Imaging (JAI) library for transcoding. It can process image data in distributed and parallel cloud computing environments, thereby minimizing the computing infrastructure overhead. In this paper, describe the proposed module using Hadoop and JAI. In addition, evaluate the proposed module in terms of processing time under varying experimental conditions

IndexTerms— Paas, Cloud computing, Hadoop,MapReduce,JAI,HIPI.

I. INTRODUCTION

In these days of rapid technological change, cloud computing [20][22][24][31][32] has achieved remarkable interest from researchers and the IT industry for providing a flexible dynamic IT infrastructure, QoS guaranteed computing environments, and configurable software services [11]. Due to these advantages, many service providers who release Social Network Services (SNS) [8][12][14] utilize cloud computing in order to reduce maintenance costs of building and expanding computing resources for processing large amounts of social media data, such as image and text formats.

In an earlier publication [5], we described the new concept of a Social Media Cloud Computing Service Environment (SMCCSE) that enables cloud computing technologies, approaches in the intensive use of computing resources, and cloud services for developing social media-based SNS. In particular, we focused on designing a social media PaaS platform as the core platform of our SMCCSE in [17]. The main role of the social media PaaS platform is to provide a distributed and parallel processing system for media transcoding functions and delivering social media, to heterogeneous devices such as smart phones, personal computers, televisions, and smart pads. This platform is composed of three parts: A social media data analysis platform for large scalable data analysis; a cloud distributed and parallel data processing platform for storing, distributing, and processing social media data; and finally, a cloud infra management platform for managing and monitoring computing resources.

In this paper, we focus on designing and implementing a Hadoop-based Image transcoding system for delivering image in a SNS by adopting a social media PaaS platform in SMCCSE. transcoding system consists of module include an image transcoding module for converting image data. The image transcoding module can transcode large amounts of image data sets into a specific format. In the traditional multimedia transcoding approaches, many researchers have focused on distributed that reduce

(2)

processing time and maintenance costs for building a computing resource infrastructure. However, these approaches focus on procuring computing resources for a image transcoding process by simply increasing the number of cluster machines in a parallel and distributed computing environment. In addition, they do not consider load balancing, fault tolerance and a data replication method to ensure data protection and expedite recovery.

We perform two experiments to demonstrate the proposedmodule’s excellence in without Hadoop and with Hadoop. In the firstexperiment, we compare the proposed module with a non-Hadoop-based single program running on two differentmachines. In addition, we verify processing time under different data sizes with non Hadoop and Hadoop based system of the proposed module according to the data sizes of images.

The remainder of this paper is organized as follows. Insection 2, we introduce Hadoop HDFS, MapReduce, and JAI.The module architecture and its features are proposed insection 3. In section 4, we describe the implementation module and in section 5 we describe the configuration of Hadoop on windows. The results of the evaluation are presented in section 6.Finally, section 7 concludes this paper with suggestions forfuture research and Related Work.

II. LITERATURESURVEY A. HDFS

Hadoop is an open source frame work for running applications on large cluster built of commodity hardware. The Hadoop frame work transparently provides applications both reliability and data motion. Hadoop implements a computational

Figure 1. HDFS Structure

paradigm named map/reduce where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster.Both maps reduce and the Hadoop distributed file systems are designed so that node failures are automatically handled by the frame work.

As shown in Figure 1, Name Node manages the namespace and controls the file access by the client, and Data Node manages the storage of each node in the cluster. In addition, DataNode executes block commands issued by Name Node

B. MAPREDUCE

MapReduce frameworks provide a specific programming model and a run-time system for processing and creating large amounts of datasets which is amenable to various real-world tasks [8]. MapReduce framework also handles automatic scheduling, communication, synchronization for processing huge datasets and it has the ability related with fault tolerance.

Figure 2. MapReduce Structure

(3)

MapReduce programming model is executed in two main steps, called mapping and reducing. Mapping and reducing are defined by mapper and reducer functions that are data processing functions. Each phase has a list of key and value spairs as input and output. In the mapping, MapReduce input datasets and then feeds each data element to the mapper as a form of key and value pairs.

In the reducing, all the outputs from the mapper are processed and a final result is created by reducer with merging process. The MapReduce programming model will be actively working with this distributed file system.

JAI

JAI is an open-source Java Advanced Image Processing library used for Hipi Image Processing. In JAI Library includes Mapping Function, Reduced Function, and Scaling Function which give the output in target format. In the paper we include JAI library on windows platform also we import many JAR files that support the eclipse and Hadoop plug-in. JAI is an open-source Java library used for image processing [7]. JAI supports various image formats (BMP, JPEG, PNG, PNM, and TIFF) and encoder/decoder functions. In addition, most of the functions related with image conversion are provided through an API, and thus, JAI can be used as a simple framework for image processing.

HIPI (Hadoop Image Processing Interface)

HIPI was created to empower researchers and present them with a capable tool that would enable research involving image processing and vision to be performed extremely easily. We modified HIPI with the following goals in mind.

1. Provide an open, extendible library for image processing and computer vision applications in a MapReduce framework.

2. Store images efficiently for use in MapReduce applications.

3. Allow simple filtering of a set of images.

4. Present users with an intuitive interface for image-based operations and hide the details of the MapReduce framework

5. HIPI will set up applications so that they are highly parallelized and balanced so that users do not have to worry about such details.

III.IMAGEDATA CONVERSION MODULE ARCHITECTURE

Figure 3. Image Conversion Module Architecture

In this study, designed and implemented a MapReduce-based image conversion module in a cloud-computing environment to solve the problem of computing infrastructure overhead. The traditional approach of transcoding image data usually involves general- purpose devices and offline-based processes. However, multimedia data processing is time consuming and requires large computing resources. To solve this problem, designed an image conversing module that exploits the advantages of cloud computing. The proposed module can resize and convert images in a distributed and parallel manner As shown in Figure 3, the proposed module stores image data into HDFS. HDFS automatically distributes the image data to each data node. The Map function processes each image data in a distributed and parallel manner

Figure 4 shows the programming elements of the image conversion module. This diagram shows the implementation of the proposed module. The steps for programming the processes are presented below:

(4)

Figure 4. Programming Module

First, The Conversion module reads image data from HDFS using the RecordReader method of the class InputFormat. InputFormat transforms the image data into sets of Keys (file names) and Values (bytes).

Second, InputFormat passes the sets of Keys and Values to the Mapper. The Mapper processes the image data using the user defined settings and methods for image conversion via the JAI library. The conversion module converts the image data into specific formats suitable for a variety of devices such as smart phones, pads and personal computers in a fully distributed manner. The Mapper completes the the image conversion and passes the results to OutputFormat as Key (file name) and Value (byte). Finally, the Mapper passes the set of Key and Value to OutputFormat.The RecordWriter method of the OutputFormat class writes the result as a file to HDFS.

The MapReduce framework provides the FileInputFormat and FileOutputFormat classes for processing petabyte-scale text data in a parallel and distributed manner. However, these classes are not suitable for processing media data. Therefore, we designed new classes:ImageInputFormat, ImageOutputFormat, suitable for performing image transcoding functions within the MapReduce framework. Also discuss implementation issues for MbTD based on Xuggler and JAI

A Design Strategy

The module is designed to read text data line by line. However, since our system deals with media data like images video data, rather than text data, we design a new ImageInputFormat class that can read image data by transforming such data to byte stream form. In addition, the ImageRecordReader class is designed to read one line record transformed as a byte stream form from ImageInputFormat and pass it to mapper.For FileOutputFormat, the record is in the same state as for FileInputFormat explained above. Hence, design new ImageOutputFormat and ImageRecordWriter classes that receive key and value pairs in record form created as a result of mapper and reducer. Subsequently, these classes output the record in a specified directory. Fig. 5 illustrates four class diagrams for image conversion function. Although, most MapReduce applications bring a result via mapper and/or reducer, only implement mapper in this system since it is unnecessary to reduce a set of intermediate values using reducer.

Moreover, the class ImageResizer using the JAI libraries is designed to perform a transcoding function by processing key and value pairs in the mapper phase. Fig. 6 shows ImageConversion with

main ( ), Map, and ImageResizer class diagrams. ImageInputFormat class diagram ImageOutputFormat class

Figure 5: ImageInputFormat class diagram

Figure 6:ImageOutputFormat class diagram

(5)

Figure 7:ImageRecordReader class diagram

Figure 8: ImageRecordWriter class diagram

Figure 5. New class diagrams designed for the image conversion function

Figure 9: ImageConversion, Map, and ImageResizer class diagrams

B.DATA SET

The cloud server used in the experiments for evaluation is a single enterprise scale cluster that consists of 28 computational nodes. Table 1 lists the specifications of the evaluation cluster.

TABLE I. EVALUATION CLUSTER SPECIFICATIONS (28 NODES)

CPU Intel Xeon 4 Core DP E5506 2.13GHz * 2EA

RAM 4GB Registered ECC DDR * 4EA HDD 1TB SATA-2 7,200 RPM

OS Linux CentOS 5.5

(6)

Java Java 1.6.0_23 Hadoop Hadoop-0.20.2

JAI JAI (Java Advanced Imaging) 1.1.3

Nine data sets were used to verify the performance of the proposed module. The average size of an image files was approximately 19.8 MB. Table 2 lists the specific information about the data sets used.

TABLE II. IMAGE DATASETS

Size(GB) 1 2 4 8 10 20 40 50 100

Number 52 104 208 416 520 1040 2111 2594 5188 Format JPG

Source Flicker

During the experiment, the following default options in Hadoop were used. (1) The number of block replications was set to 3, and (2) the block size was set to 64 MB. Evaluated the performance of the proposed module and optimized it. In this planned and executed two experiments. In the first experiment, compared the proposed module with a non-Hadoop-based single program on two different single machines. The specifications of the two machines are listed in Table 3.

Machine A Specifications

CPU AMD Athlon II-X4 632 2.9GHz

RAM 4GB DD

HDD 600GB SATA-2 7,200 RPM OS Linux CentOS 5.5

Java Java 1.6.0_23

Machine B Specifications

CPU Intel Xeon 4 Core DP E5506 2.13GHz

* 2EA

RAM 4GB Registered ECC DDR * 4EA HDD 1TB SATA-2 7,200 RPM

OS Linux CentOS 5.5 Java Java 1.6.0_23

Measured each running time taken in cloud server using MapReduce programming and taken in machine A and B applying only sequential programming using JAI libraries without MapReduce, respectively. Figure 7 shows the result of the first experiment.

(7)

Figure 7. Elapsed Time for Proposed Module with Two Different Machines

The elapsed times in machines A and B are less than the run time taken in less than 2 nodes in our cluster. In cases 1 and 2, the performance without Hadoop in machines A and B is better than that in our cluster because in MapReduce programming, the nodes distribute the processing, thereby

causing overhead associated with the creation of map tasks, job scheduling, and transporting speed of HDFS. In the second experiment, we compare the JVM reuse option 1 with the JVM reuse option -1. Option 1 reuses JVM only one time. There is no limit on the number of times Option -1 reuses JVM. Figure 6 shows the result of the second experiment.

Figure 8. Elapsed Time for JVM Reuse Option V. CONCLUSIONS & FUTURE WORK

Recently, The wide availability of inexpensive hardware such as personal computers, digital cameras, Smartphone, and other easy- to-use technologies has enabled the average user to immerse in multimedia. The remarkable growth of Internet technologies such as social networking services (SNSs) allows users to disseminate multimedia objects. However, multimedia data processing techniques impose a burden on the computing infrastructure as the amount of data increases. Thus, this paper designed and implemented a MapReduce-base image conversion module in a cloud computing Environment. The proposed module is based on Hadoop HDFS and the MapReduce framework for distributed parallel processing of large-scale image data. In this redesigned and implemented InputFormat and OutputFormat in the MapReduce framework for image data and used the JAI library for converting the image format and resizing the images. It exploited the advantages of cloud computing to handle multimedia data processing.For that performed two experiments to evaluate the proposed module. In the first experiment, compared the proposed module with a non-Hadoop-based single program using the JAI library. The proposed module shows better performance than the single program.

In the second experiment, evaluated its performance. The results of the second experiment show that when the proposed module processes large numbers of small files, it exhibits better performance. Future research should focus not only on image data but also on video data. We plan to implement an integrated multimedia

VI ACKNOWLEGEMENT

This is a great pleasure & immense satisfaction to express Vaishali M. Patil deepest sense of gratitude & thanks to everyone who has directly or indirectly helped me in completing my work successfully.

(8)

Author Vaishali M.Patil express her gratitude towards guide Prof.Deepak Gupta,Professor Department of Computer Engineering, Siddhant C.O.E., Sudumbare, Pune who guided & encouraged me in completing the work in scheduled time.

VII REFERENCES

[1] J. Dean and S. Ghemawat, ―MapReduce: Simplified data processing on large clusters,‖ Communication of the ACM, vol.51, no.1, pp.107-113, Jan. 2008. Article (CrossRef Link)

[2] K. Shvachko, H. Kuang, S. Radia and R. Chansler, ―The Hadoop distributed file system,‖ in Proc. 2846 Kim et al.: A Hadoop- based Multimedia Transcoding System for Processing Social Media of 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, pp.1-10, May. 2010. Article (CrossRef Link)

[3] S. Islam and J.-C.Gregoire, ―Giving users an edge: A flexible cloud model and its application for multimedia,‖ Future Generation Computer Systems, vol.28, no.6, pp.823-832, Jun. 2012. Article (CrossRef Link)

[4] W. Gropp, et al., ―A high-performance, portable implementation of the MPI message passing interface standard,‖ Parallel Computing, vol.22, no.6, pp.789-828, Sep. 1996. Article (CrossRef Link)

[5] M. Kim and H. Lee, ―SMCC: Social media cloud computing model for developing SNS based on social media,‖

Communications in Computer and Information Science, vol.206, pp.259-266, 2011. Article (CrossRef Link)

[6] G. Barlas, ―Cluster-based optimized parallel video transcoding,‖ Parallel Computing, vol.38, no.4-5, pp.226-244, Apr. 2012.

Article (CrossRef Link)

[7] I. Ahmad, X. Wei, Y. Sun and Y.-Q. Zhang,‖ Video transcoding: An overview of various techniques and research issues,‖

IEEE Transactions on Multimedia, vol.7, no.5, pp.793-804 Oct. 2005. Article (CrossRef Link)

[8] Y.-K. Lee, et al., ―Customer requirements elicitation based on social network service,‖ KSII Transactions on Internet and Information Systems, vol.5, no.10, pp.1733-1750, Oct. 2011. Article (CrossRef Link)

[9] S. Mirri, P. Salomoni and D. Pantieri, ―RMob: Transcoding rich multimedia contents through web services,‖ in Proc. of 3rd IEEE Consumer Communications and Networking Conference, vol.2, pp.1168-1172, Jan. 2006. Article (CrossRef Link)

[10] Hadoop MapReduce project, http://hadoop.apache.org/mapreduce/

[11] Z. Lei, ―Media transcoding for pervasive computing,‖ in Proc. of 5th ACM Conf. on Multimedia, no4, pp.459-460, Oct.

2001. Article (CrossRef Link)

[12] D.M. Boyd and N.B. Ellison, ―Social network sites: Definition, history, and scholarship,‖ Journal of Computer-Mediated Communication, vol.13, no.1, pp.210-230, Oct. 2007. Article (CrossRef Link)

[13] Apache Hadoop project, http://hadoop.apache.org/

[14] C.L. Covle, and H. Vaughn, ―Social Networking: Communication revolution or evolution?,‖ Bell Labs Technical Journal, vol.13, no.2, pp.13-17, Jun. 2008. Article (CrossRef Link)

[15] J. Guo, F. Chen and L. Bhuyan, R. Kumar, ―A cluster-based active router architecture supporting video / audio stream transcoding service,‖ in Proc. of Parallel and Distributed Processing Symposium, Apr. 2003. Article (CrossRef Link)

[16] Y. Sambe, S. Watanabe, D. Yu, T. Nakamura, N. Wakamiya, ―High-speed distributed video transcoding for multiple rates and formats,‖ IEICE Transactions on Information and Systems, vol.E88-D, no.8, pp.1923-1931, Aug. 2005. Article (CrossRef Link)

[17] M.-J. Kim, H. Lee, H. Lee, ―SMCCSE: PaaS Platform for processing large amounts of social media,‖ in Proc. of the 3rd international Conf. on Internet, pp.631-635, Dec. 2011. Article (CrossRef Link)

[18] J. Shafer, S. Rixner and A.L. Cox, ―The Hadoop distributed file system: Balancing portability and performance,‖ in Proc. of IEEE International Symposium on Performance Analysis of Systems and Software, pp.122-133, Mar. 2010. Article (CrossRef Link)

[19] Z. Tian, J, Xue, W. Hu, T. Xu and N. Zheng, ―High performance Cluster-based Transcoder,‖ in Proc. of 2010 International Conf. on Computer Application and System Modeling, vol.2, pp. 248-252, Oct. 2010. Article (CrossRef Link)

[20] R. Buyya, C. Yeo, S. Venugopal, J. Broberg, ―Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility,‖ Future Generation Computer Systems, vol.25, no.6, pp.599-616, Jun. 2009. Article (CrossRef Link)

[21] R. Lammel, ―Google’s MapReduce programing model - Revisited,‖ Science of Computer Programming, vol.68, no.3, pp.208-237, Oct. 2007. Article (CrossRef Link)

[22] M.A. Vouk, ―Cloud Computing – Issues, research and implementations,‖ in Proc. of 30^th International Conf. on Information Technology Interfaces, pp.31-40, Jun. 2008. Article (CrossRef KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov 2012 2847 Link)

[23] C. Poellabauer and K. Schwan, ―Energy-aware media transcoding in wireless systems,‖ in Proc. Of Second IEEE Annual Conf. on Pervasive Computing and Communications, pp.135-144, Mar.ch, 2004. Article (CrossRef Link)

[24] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M.

Zaharia, ―A view of cloud computing,‖ Communication of the ACM, vol.58, no.4, pp.50-58, April, 2010. Article (CrossRef Link)

(9)

[25] Xuggler Java library, http://www.xuggle.com/xuggler/index

[26] Shenoy, J. Prashant, Vin, M. Harrick, ―Efficient striping techniques for multimedia file servers,‖ in Proc. of the IEEE International Workshop on Network and Operating System Support for Digital Audio and Video, pp.25-36, 1997. Article (CrossRef Link)

[27] M.A. Weifeng, and M. Keji, ―Research on java imaging technology and its programming framework,‖ Lecture Notes in Electrical Engineering, vol.72, pp.61-68, 2010. Article (CrossRef Link)

[28] D. Seo, J. Lee, C. Choi, H. Choi, I. Jung, ―Load distribution strategies in cluster-based transcoding servers for mobile clients,‖ Lecture Notes in Computer Science, vol.3983, pp.1156-1165, May. 2006. Article (CrossRef Link)

[29] S. Roy, B. Shen, V. Sundaram, R. Kumar, ―Application level hand off support for mobile media transcoding sessions,‖ in Proc. of the 12th International Workshop on Network and Operating Systems for Digital Audio and Video, pp.95-104, May.

2002. Article (CrossRef Link)

[30] D. Seo, J. Kim and I. Jung, ―Load distribution algorithm based on transcoding time estimation for distributed transcoding servers,‖ in Proc. of 2010 Conf. on Information Science and Applications, article no.5480586, Apr. 2010. Article (CrossRef Link) [31] R.L. Grossman, ―The Case for Cloud Computing,‖ IT Professional Journal, vol.11, no.2, pp.23-27

AUTHORS PROFILE

1. PROF.DEEPAK GUPTA: ASSISTANTPROFAT SIDDHANT COLLEGE OF ENGINEERING, PUNE, B.E. (COMPUTER), M.TECH COMPLETED IN 2007, AREA OF INTEREST –IMAGE PROCESSING,NATURAL LANGUAGE PROCESSING

2. MS.VAISHALI PATIL : PG Stedent at Siddhant College of Engineering, Pune. B.E. (Computer), M.E. Pursuing at Siddhant College Of Engineering, Pune , Area of Interest- Hadoop, Image Processing