• No results found

Big Data Implementation Using Hadoop and Grid Computing

N/A
N/A
Protected

Academic year: 2020

Share "Big Data Implementation Using Hadoop and Grid Computing"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

ISS N (Online) : 2319 – 8753 ISS N (Print) : 2347 - 6710

I

nternational

J

ournalof

I

nnovative

R

esearch in

S

cience

, E

ngineeringand

T

echnology An ISO 3297: 2007 Certified Organization Volume 3, Special Issue 4, April 2014

Two days National Conference – VISHWATECH 2014 On 21st & 22nd February, Organized by

De partment of C IVIL, C E, ETC , MECHNICAL, MECHNICAL SAND, IT Engg. Of Vishwabharati Academy’s College of enginee ring,

Ahmednagar, Maharastra, India

Copyright to IJIRSET www.ijirset.com 256

Big Data Implementation Using Hadoop and

Grid Computing

Yuvraj S. Sase , Pratik A.Yadav

ME, Vishwabharti Academy’s College of Engineering, India.

ME, Vishwabharti Academy’s College of Engineering, India

Abstrac t:Big data is business transformation. So every organizat ion is trying to analyze their big data. Big data poses imple mentation proble ms in e xt re me conditions . This paper focuses on some methods to use grid computing along with hadoop. Grid co mputing provide large storage capability and computation power. Th is paper listed some open source toolkit use to imple ment solution such as Hadoop, Globus Toolkit.

Ke ywor ds: grid co mputing, hadoop, storage capacity, processing capacity

I.INTRODUCTION

Big Data is energy source of present world. It refashions future of Global Economics. Big data revolution changes way of thin king in business [5]. It was waste material in past and now it helps organization to progress in great e xtent. Big data affect decision ma king fro m the bottom-up and the top-down. Big Data speed up discoveries and small predictions in daily act ivities.Data with great volume that is unstructuredand having great velocity are main identities of big data. Orac le add the fourth, value[4]. For getting valuable data fro m b ig data, system ana lyzes large volu me o f irrelevant data.

Hadoop, an open source tool, can be used to analyze big data.Hadoop consist HDFS and map-reduce functionalities. Hadoop also provides reliab ility by replicating data over mu ltip le nodes. Map function analyze data and produce result for each entity and reduce function reduces number of results [7]. Big data analysis call for la rge storage capacity and great processing power which can be satisfied by grid co mputing[2].Co mputing tools like g lobus toolkit is available for grid co mputing. Globus toolkithas gram service and job manager respectively to control job e xecution and scheduling best node for e xecution. Condor, fork, pbs are e xa mp les of Job managers in globus.

Integration of Hadoop and Globus Toolkit can provide solution on storage and analysis of big data.

II. HADOOP ARCHITECTUR E

Map reduce and hadoop distributed file system (HDFS) services are provided by Hadoop . In HDFS file system, one Na me Node and mult iple DataNodes are p resent.Map reduce engine have job tracke r present onNa meNode and task trackers on DataNodes. Job tracke r is respons ible for job submission on DataNodes and DataNode actually e xecutejobs[7].

A.Namenode

(2)

Copyright to IJIRSET www.ijirset.com 257

Fig. 1 A rchitecture of Na meNode

B. DataNode

DataNodes store application data in the data bock format. When DataNode starts, first it does handshake with Na meNode. Handshake process registers DataNode to Na me Node.DataNode reg istered with unique storage id. DataNode identifies replicas present on it and send block report to Name Node . Na me Node stores metadata of datablocks present on that DataNodefrom block reports .DataNode have task trackers which e xecutes jobs on DataNode.

Fig.2 Arch itecture of Datanode.

C. Job tracker and Task Tracker

(3)

ISS N (Online) : 2319 – 8753 ISS N (Print) : 2347 - 6710

I

nternational

J

ournalof

I

nnovative

R

esearch in

S

cience

, E

ngineeringand

T

echnology An ISO 3297: 2007 Certified Organization Volume 3, Special Issue 4, April 2014

Two days National Conference – VISHWATECH 2014 On 21st & 22nd February, Organized by

De partment of C IVIL, C E, ETC , MECHNICAL, MECHNICAL SAND, IT Engg. Of Vishwabharati Academy’s College of enginee ring,

Ahmednagar, Maharastra, India

Copyright to IJIRSET www.ijirset.com 258

respective task trackers. Job tracker always tries to execute job on nodes which are near to data. Task trackers exe cute jobs sent by job tracker. Job trac ker checks status of task tracker a fter every few minutes.

Fig. 3 A rchitecture of Job tracke r and task tracke r

III. PROB LEMS IN HADOOP

Hadoop provide different nodes and services for storing and analyzing big data. But in e xtre me conditionshadoop can experience proble ms. Such proble ms affect overall performance of Hadoop and can be listed as below

Problem 1:NameNode failure due to limited capacity to store the metadata

Problem 2 :Searching the huge metadata at NameNode may go inefficient due to its size

Problem 3: No inbuilt facility for parallelization of jobs submitted to tasktrackers

IV. GRID COMPUTING ARCHIT ECTURE

(4)

Copyright to IJIRSET www.ijirset.com 259

Fig. 4 Grid Co mputing Architecture

As shown in figure, Grid co mputing has one control node and all others are grid nodes connected to each other. User can submit job fro m any node and control node decides node on which job is going to execute . User feels single system when user access grid network. Open source tools such as Globus Toolkit can be use to create computer grid network. Globus toolkit has GRAM, GSI, GridFTP co mponents to provide all services required fo r grid computing. GRAM service controls a ll job e xecution tasks. GRAM service uses job manager such as condor to locate job e xecuting node GSI handles security and GridFTP is to transfer data between diffe rent nodes.

V. SOLUTION FORHADOOP’S PROBLEMS US ING GRID

(5)

ISS N (Online) : 2319 – 8753 ISS N (Print) : 2347 - 6710

I

nternational

J

ournalof

I

nnovative

R

esearch in

S

cience

, E

ngineeringand

T

echnology An ISO 3297: 2007 Certified Organization Volume 3, Special Issue 4, April 2014

Two days National Conference – VISHWATECH 2014 On 21st & 22nd February, Organized by

De partment of C IVIL, C E, ETC , MECHNICAL, MECHNICAL SAND, IT Engg. Of Vishwabharati Academy’s College of enginee ring,

Ahmednagar, Maharastra, India

Copyright to IJIRSET www.ijirset.com 260

Fig. 5 Co mb inearchitecture of Hadoop and Grid

Above architecture shows grid and hadoop node. First grid node act as Name Node of Hadoop and other grid nodes are DataNodes. One more sub grid is form with Na me Node as control node and different number of grid nodes. Solution fo r first proble m is solved by this sub grid. Each grid node of sub grid shares storage resources. Therefore Na meNode can use this storage me mory to store metadata. All storage me mory treated as if it is of only one system. User is unaware about grid storage. Whole sub grid is treated as one Na me Node. So p roble m of storage is re moved by combination of Hadoop and grid.Second proble m of co mputation power is a lso solved by sub grid. Job of searching metadata of particu lar data block can be parallelized on number of grid nodes which min imizes time require to searching in metadata. Sub grid increases computation power of Na me Node . Client application a lso can effic iently access Name Node.

(6)

Copyright to IJIRSET www.ijirset.com 261 VI. CONCLUS ION

Hadoop analyze big data but faces some proble m of storage and computation power whenever Na meNode is overloaded. Grid helps hadoop to overcome these problems. So co mbination of Hadoop and grid for imp le ment big data can be used.

REFERENCES

[1] Garlasu,D.Sandulescu,V.; Halcu,I.;Neculoiu,G.; Grigoriu,O.; Marinescu,M.; Marinescu, V., “ A big data implementation based on Grid

computing” , Roedunet International Conference (RoEduNet), 2013 11th

[2] “ A big data implementation using grid computing” ,www.pantechproed.com/download_project.php

[3]C.Chandhini, MeganaL.P , “ Grid Computing-A Next Level Challenge with Big Data” , International Journal of Scientific & Engineering Research Volume 4, Issue3, March-2013 1ISSN 2229-5518

[4]An Oracle White Paper March 2013 , “ Big Data analytics : Advanced analytics in oracle database”

[5] “IDEAS ECONOMY: FINDING VALUE IN BIG DAT A”, www.oracle.com/us/technologies/big-data/index.html

[6] Ann Chervenak; Ian Foster; Carl Kesselman; Charles Salisbury; Steven T uecke, “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientifc Datasets”

Figure

Fig. 1 Architecture of NameNode
Fig. 3 Architecture of Job tracker and task tracker
Fig. 4 Grid Computing Architecture
Fig. 5 Combinearchitecture of Hadoop and Grid

References

Related documents

Keywords— Big Data Problem, Hadoop cluster, Hadoop Distributed File System, Parallel Processing, MapReduce I..

Grid technology is used for data storage and data processing while ant-based algorithm is utilized for data clustering in the proposed framework for big data manipulation as

Hadoop: The solution was provided by the Google named hadoop an open source frame work that is used to store, process and analysis of data which is large , in a

Hadoop Distributed File System (HDFS) is the storage system of Hadoop which splits big data and distribute across many nodes in a cluster.. This also replicates data in

The big data analytics is processed for slower shared storage which prefers for direct attached storage in different forms to have high capacity buried inside

Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data.. Big data can be any structured collection which

SanDisk SSDs can be deployed strategically in big-data analytics environments based on Hadoop to provide faster query response times (9-27% faster) for specific Hive data

Hadoop MapReduce programming model and Hadoop Distributed File System (HDFS) are one of the most popular distributed computing technologies for distributing big data