Abst is a s huge Java, Progr MapR save backe Index 1.1 B shop appli chan amou a hig any struc are th a) b) c) Orga beca MS A and easy Peop form quer Pica gene Mac abou This mob devic smar sea, toget hour very may very gene YouT data.
MAPRE
PROC
tract: Digitalda software framewvolume of dat , Python and R ram, Apache Pi Reduce jobs. In the result in o end process in M
x Terms— Big
Big Data: In t p, office, h ication, webs nging with ex unt of data, w gh velocity. B
huge amoun ctured data tha
hree sources o Organization People genera Machine gene anization gen ause all the da ACCESS, Or column. To y task as the da
ple generated m like Faceboo
ry, pictures on sa. Social me erate terabyte/p
hine generate ut 85% - 90% data is gener ile; devices li ces. The wid rt phone, sma
oceans, and s ther producin rs and days. In y high, every
be in terabyt y high veloci erated about 3 Tube. Data ca .
Interna
EDUCE:
CESSING
ata which come work to store a ta. MapReduce Ruby. The main ig and Hive.Hiv n this research p output file.At th MapReduce job
Data, Hadoop,
I. INTROD
today’s world hospital, coll sites, digital d xtremely high which comes in Big data is a nt of structure
at has the to b of big data:
generated dat ated data erated data nerated data: ata is stored in racle, MS SQL
search any in ata already in
data: is in sem ok messages, n Instagram, edia is main petabyte of da d data: This is of the total d rated by vario ike sleeping m despread avail
art cars, smar sky also conn ng a huge am n big data the
minute total te or petabyte ity for examp
50GB of data an be Batch, n
Volu
ational Jou
Av
INSIGHT
USING J
Lece from different and process this is a programm n objective of th
ve and Apache paper we write he end we perf b.
, MapReduce, H
DUCTION
d data is every leges, unive devices, etc.T h rate and pro
n a variety of growing term ed , semi stru be used for ana
ta
is the most n different dat L etc. in form nformation in particular sha mi-structured twitters tweet platform wh ata every day. s the major so data is generat ous real time s monitor, heart lability of sm rt homes, sma nected to this mount of data quantity or v
amount of da e. This data is ple every m a, 72 hours vid
near time, rea
ume 9, No.
urnal of Ad
RESE
vailable On
T ANALY
JAVA PRO
Dr cturer (I.T.), S
Salalah
t sources like o s enormous am ming model init his paper is to Pig working o MapReduce pr form the opera
Hive. Apache P
ywhere, either ersities, mob Technologies a
oducing a hu form along w m that describ uctured and u alysis [1]. The
structured da tabases either m of tables, row
n database is ape.
or un-structur t, Google sear
here people w
ource of big da ted by machin sensors, came
beat monitori mart devices li art camera, no
devises and a every minu volume of data ata is generat s generated a inute Facebo deo uploaded al time or strea
1, January
dvanced Re
EARCH PA
nline at ww
YSIS OF B
OGRAMM
. Ujjwal Agar Salalah Colleg h, Sultanate o
ffice, school, h mount of data. H
tiated by Googl describe the co on top layer of H
rogram in Java ation in HiveQL
ig r in bile are uge with bes un- ere ata r in ws an red rch will ata ne. era, ing ike ow all ute, a is ted at a ook on am A str or 1. w pr m fr w co fr an la (2 co Sy H na co da Th si st re 1. w Ec fo str fil di pr ap ab
y-February
esearch in C
APER
ww.ijarcs.in
BIG DAT
MING, H
rwal
ge of Technolo f Oman
hospital, social m Hadoop is using
le which can be oncepts of Map Hadoop ecosys to find anagram L (Hive query
As data is co ructure of big r un-structure.
2 Hadoop: A which is used t rocess this d model. It
om commodit way that alwa ommon and
amework. [2] nalysis of bi aunched on 10 2.7.3) came in omponent of ystem). HDF HDFS works ame node an ontroller/mast ata node. It sto
his metadata ze and file pe ore in differe eplication facto
3 Hadoop E which is used f
cosystem tied or structured
ructured and u le system w ifferent racks rovide the par pplication is bstraction like
y 2018
Computer
nfo
TA VIA PA
HIVE AND
ogy,
media or machi g HDFS and M e written in dif pReduce and sh tem and provid m words from i language) and
oming from g data varies it
.
Apache Hadoo o store large a data by usin
consists ty hardware. ays it assum
it should be . Hadoop is a ig data. The 0th December
nto existence Hadoop is H S works on on Master-sla nd multiple d
er of the syst ores the metad includes nam ermission. The ent clusters ac
or of any data
Ecosystem: H for storage an d up with ma data like sqo un-structured which respons
. Above that rallel data pro working w e Pig, Hive or
Science
ISSN No
ARALLEL
D APACH
ine generated d MapReduce to st fferent program howing the ope de the level of a input files, grou d Pig Latin Scr
various sour t can be struct
op is an open-amount of dat ng the MapRe of compute Hadoop is d es hardware
automaticall a major platfo first release 2011 and the on 25th Augu HDFS (Hadoo
Name Node ave architectu data node. N tem. Name no data of all the me, location of
e data that we cross the nod a is 3 on data n
Hadoop is a so nd analysis of any different oop and hive . In the core H sible to store
MapReduce ocessing and which provid
Sqoop etc.
o. 0976‐5697
L DATA
HE PIG
data. Apache H tore and proces mming language eration by using abstraction to ru up them togethe ript and showin
rces therefore ture, semi-stru
-source frame ta using HDFS educeprogram er clusters designed in su
failures are y handled by orm for storing e of Hadoop
first stable ve ust 2016. The op Distributed e and Data N
ure as it has Name Node i ode spreads da
files in the H f each block, b
store in Hado des. By defau
node.
oftware frame Big Data. Ha applications e, some for Hadoop distrib e all the da
Map proc Hado langu has t perfo RED anoth Map into brok The Map value The outp from the s job i
We c calle a) M b) M
By d only
Fig
II MA
pReduce [1, essing. This m oop can run uage, like Jav two different orms. The f DUCE, but b her steps call p which takes
another set ken down into next step is s p step and so
e) pairs. last sand fina ut from sort m Shuffling ph
[image:2.595.45.270.54.141.2] [image:2.595.38.261.427.489.2] [image:2.595.39.258.575.709.2]sequence of t is always perfo
Figure
can divide Ma ed:-
MapReduce wit MapReduce wi
Figure 2.2
default numb y one reducer t
gure 1.1 Hado
APREDUCE
2] is a pro model is very MapReduce va, Python and
operations th first step is between map
led sort and s the group of of data, wh tuples (key/v sort and shuff ort and shuffl
al step is reduc and shuffle a hase and retur the name Ma formed after th
2.1 Architectu
apReduce task
th single redu ith multiple re
2 MapReduce
er of reducer task will be ex
oop Ecosystem
FRAMEWO
ogramming m simple but st program writ d Ruby. The te hat Apache H MAP and a and reduce shuffle. The f data or files here individua
alue pairs). fle, which tak le the result,
cing step,redu as input and c rns a single o apReduce imp he map job.[3]
ure of MapRe
k as two differ
uce task educe task
e Single Reduc
rs is set to 1 xecuted as sho
m
ORK
model for da till too comple tten in differe erm MapRedu Hadoop progra another step we have tw first step is o
and converts al elements a
kes the output based on (ke
uce job takes t combines valu output value. A plies, the redu
], [4] [5], [6]
educe
rent categories
ce Task
, means alwa own below.
ata ex. ent uce am is wo our s it are
of ey,
the ues As uce
s
ays
M in us fo
Jo jo
C co
M w ta th in
Th by W im so M re
M R
Th A pr str fil di
B ec str M C M th
Many reducers ndependently.
ser.We can c ollowing comm
ob job = new J ob.setNumRed
onfiguration c onf.set("mapre
Figure 2
MapReduce F working on <K akes the input he output again nput for Reduc
he key and the y the framew Writable interf
mplement the orting by the MapReduce jo educe -><k3, v
Inpu Map Job <key educe <key
III MA
here are differ A) The first w
rogram, these ructured and les. A MapR ifferent classe a) Mappe b) Reduc c) Driver ) Apache Pig cosystem and ructure data a MapReduce. [7 ) The Hive MapReduce is
he form of row
s can run in The number change the nu
mands:
Job(conf); duceTasks(3);
conf = new Co educe.job.redu
2.3 MapReduc
Framework: Key,Value> p t job as a pair n in the form ce job.
e value classe work and he face. Addition Writable-Co e framework. ob: (Input) <k v3> (Output).
ut Value y1, value1> y2, List (value2
Tab
APREDUCE
rent ways to e way to execu programs ca un-structured educe program s:
er Class cer Class r Class g is working
it is used to and used to pr 7]
Query Lang used for anal w and columns
parallel, as r of reducers umber of red
// 3 Reducers or
onfiguration() uces", "5"); //5
ce Multiple R
The MapRe pair. Firstly t
r of <Key, Va <key, Value>
es should be in ence, need t nally, the ke mparable inte
Input and O k1, v1> -> m
Output List <k 2) > List (K
ble 2.1
IMPLEMEN
execute MapR ute by using an be used for d data like .csv m usually co
g on the top analysis of s rocess the scri
guage (HiveQ lysis of structu s. [8]
they are wo is decided b ucer by usin
s
);
5 Reducer
Reduce Task
educe frame the Map oper alue> and pro > which will b
n serialized ma to implement
y classes hav erface to faci Output types map -><k2, v
t Value key2, value2> Key3, value3>
NTATION
Reduce jobs:-Java MapRe r structured, v, .json, or D nsists of the
layer of Ha structure and
ipting approac
QL or HQL ured data whi
orking y the g the
ework ration oduce be the
anner t the ve to ilitate
of a v2>->
educe semi-DBMS
three
adoop semi-ch for
3.1. takin of ch word rearr vary grou Map
Anag
inpu each Hado aba a key a
publ thro
{ Syste Strin Strin Strin
whil
{ Strin
s1=(
char
Arra
sorte conte
} }
Anag
and c
publ
key,I
thro
{ Strin Strin
for ( {
str=s
}
conte
}
Anag
exec call Anag file.
publ Clas
jobjo job.s
To implemen ng the exampl haracter forme d or characters ranged into " ying string, w up them toge pReduce we ar a) Anagram b) Anagram c) Anagram
gramMapper
ut file line by l h toke to form
oop will crea and baa will g as aab with gr
lic void map(T
owsIOExceptio
em.out.println ng keystring = ngTokenizerst
ng sorted = nu le(st.countTok
ng s1 = newSt String) st.nex
r[] chars =s1.t ays.sort(chars)
ed = new Strin
ext.write(new
gramReducer
combine them
licvoid reduce Iterable<Text
owsIOExceptio
ngBufferstr=n
ng finalkey =v
(Text t : value
str.append(t.to
ext.write(new
gramDriver
cution of prog the Anagr gramReducer
ic static void sNotFoundEx
ob = new Job( setJarByClass
nt the MapRed le of an anagra ed by rearran s, For exampl adhpoo".Inpu we need to id ether.For solv re usingthree c mMapper mReducer mDriver
Class(SortK
line.It will cre m a key with ate a group of
get sorted as a roup {aba,baa
Text key, Tex on, Interrupte
n("In mapper"
= key.toString(
t = newtringTo
ull ; kens()!=0)
tring(); xtElement();
toLowerCase( );
ng(chars);
w Text(sorted)
rClass: It will m into single st
e(Text t>values,Conte
on,Interrupted
ewStringBuff
values.iterator(
s)
oString()+" ");
w Text(finalkey
Class: The d ram by callin ramMapper
class and fin
main(String[] xception, Inter
();
(AnagramDri
duce program am. Anagram nging the lette le, the world “ ut files contai dentify anagra ving this pro
classes:-
KeyMapper):
eate a token fr h original va f similar word aab. So reduce a}
xt value, Conte dException
"); ();
okenizer(keys
().toCharArray
, new Text(s1
read each me tring separate
extcontext) dException
fer(); ().next().toStr
;
y), newext(str
driver progra ng both the cla class then nally write the
args) throws rruptedExcept
ver.class);
m in Java we a m is a word or
ers of a differe “hadoop” can in n number ams words a oblem by usi
This will re om line and s alue.In this w
ds.For examp er will receive
ext context)
string," ");
y();
1));
ember of group d by tab.
ring();
r.toString()));
am controls t asses first it w it will c e results in ne
IOException, tion{
are set ent be of and ing
ead ort way ple: e a
p
the will call ew
,
jo
Fi Pa w
Fi Pa w
jo jo jo jo jo jo jo
Sy }
3. pr an ec
H tra ex Tr an
To w em ob ta
H H
ob.setJobName
ileInputForma ath("C:\\Users workspace\\An
ileOutputForm ath("C:\\Users workspace\\An
ob.setInputFor ob.setMapperC ob.setReducerC ob.setMapOutp ob.setMapOutp ob.setOutputK ob.setOutputV
ystem.exit(job
2: Hive: Hiv rocess and ana nd databases. cosystem. Hiv
a) Data s b) Query c) Analy Hive supports anslate SQL l xecute the qu racker pass th nd Reduce tas
F
o exhibit the w we take the
mployee datab bjective is to able”. We will
Hive > use emp Hive >Select C
e("Ana");
at.addInputPat s\\ujjwal\\eclip
agram1\\hado
mat.setOutputP s\\ujjwal\\eclip agram1\\hado
rmatClass(Key Class(Anagram Class(Anagram putKeyClass( putValueClas KeyClass(Text ValueClass(Tex
b.waitForCom
ve is data wa alysis of struc Hive is wor ve has mainly summarization y
sis of data s query lang
[image:3.595.349.522.443.590.2]ike query into ery it will pa he job to Tas k will execute
Figure 3.2.1Ar
working struc databasecalled base we have
fine the tota execute the f
ployee; Count (*) from
th(job, new
pse-oop\\anagram.
Path(job, new
pse-oop\\out_ana")
yValueTextIn mMapper.clas mReducer.cla Text.class); s(Text.class); .class); xt.class);
mpletion(true)?
arehouse tool ctured data in rking on top three basic fu n
guage called o the MapRedu ass to Job Tra
sk Tracker an e and fetch the
rchitecture of
cture of MapR d “employee e table called al number of following Hive
m emp;
txt"));
w
));
nputFormat.cla ss);
ass);
?0:1);
which is us the form of t layer of Ha unctions:-
HiveQL, w uce jobs. Onc acker and then nd finally the e data from HD
Hive
Reduce in Hive e” and inside “emp”. Our rows in the eQL query:-
ass);
ed to tables adoop
which ce we n Job Map DFS.
e first e the
To e show seco
3.3 Map data Map oper langu parts
The Map
$ pig
$ pig
Gran stude , city
Usin stand
[image:4.595.314.526.54.213.2]Gran
Figure 3.2.2
execute this qu wn in 3.2.2 and
nd.
Apache Pig pReduce. Pig i set witho pReduce.We
rations in Had uage called P s:-
[image:4.595.37.278.54.206.2] [image:4.595.47.218.364.555.2]a) Loading b) Transform c) Dumping
Figure 3
following co pReduce execu
g –version
g –x mapreduc
nt>A=LOAD ent.txt' USING y:chararray);/*
ng dump meth dard output.
nt >Dump A;
2 MapReduce
uery, Hive per d the total tim
g: Apache P is used for an out using
can perform doop using Ap Pig Latin. E
ming g of the data.
3.3.1 Architec
ommand we ution on Apac
ce
'hdfs://localho G PigStorage( *Load File */
hod, the proces
e job for HiveQ
rforms the Ma me required is
Pig is an ab nalysis of larg
writing the all the dat pache Pig by Every pig pro
cture of Apach
will execute che Pig:-
ost:9000/pig_d (',') as (id:int, n
ssed data is di
QL query
apReduce job 1 min and 22
bstraction ov ge or distribut
program ta manipulati y using scripti ogram has thr
he Pig
to display t
data/
name:chararra
isplayed on th as
ver ted of ion ing ree
the
ay
e
To as se
Th ec M vo M ex th do de of M w qu ex on ab bu fo M
[1
[2
[3
[4
[5
[6
Figure 3.
o execute this s shown in 3.2 econd.
his paper e cosystem and MapReduce pr olume of d MapReduce bre
xecute then Re his feature in H
own as per < efault there is f reducer as MapReduce job we execute the
uery and sho xecuted by M n top layer of bstraction, as u ut just he writ or Pig its Pig L MapReduce Job
] Jefry Dean an Processin 53, Issuse 2]Jefry Dean an
processin Volume 5 ] M. Dhavapri and Solut (IJCST) – 4] Dr. Urmila Mapreduc (AJER), V 5]AbdelrahmanE Sharkawi Direction Engineeri 6]Shafali Agarw
Expansion Computer
3.2 MapRedu
s Pig Script,P 2.2 and the tot
IV CON
exploits an d deeply expl
rogramming m data by usin
eak the job in educe job wil Hadoop ecosy <key,value> s one reducer
per need. W b in Java pro e HiveQL and owing that at apReduce job f Hadoop ecos
user no need t te the SQL lik Latin Script, th
bs.
V REF
nd Sanjay Ghem ng Tool, Comm
e.1,January 201 nd Sanjay hem ng on large clus
51 pp. 107–113 iya, N. Yasodh tions Using Ha – Volume 4 Issu a R. Pol, Big ce, American Volume-5, Issu Elsayed, Osam i, MapReduce ns, International ing, Vol. 6, No. wal, Map Red n, (IJACSA) I r Science and A
uce job for Pig
Pigperforms th tal time requir
NCLUSION
overview o ained MapRe model used t ng parallel n two groups ll perform. Ma ystem so that given by the but we can c We solve ana gramming lan d Pig Latin Sc t backend th b. Hive and Ap
system and pr to write the co ke script in hi
hat automatic
FERENCE
mwat, MapRed munications of 0, pp 72-77. mwat,.MapRedu sters, Communi
, 2008 ha, Big Data A adoop, Map Re ue 1, Jan - Feb g Data Analy
Journal of En ue-6, pp-146-15 ma Ismail, and
: State-of-the-l JournaState-of-the-l of Com
. 1, February 20 duce: A Surve
International Jo Applications, V
g Latin Script
he MapReduc red is 3 min an
fbigdata, Ha educe architec
to deal with data proces first Map job apReduce pro data can be b e programmer change the nu agram problem
nguage and fu cript to execut he whole que pache Pig wo rovide the lev ode of MapRe ive its HiveQL ally converted
duce:A Flexible f the ACM, Vo
uce: Simplified ications of the A
Analytics: Chall educe and Big T
2016 ysis Using H
ngineering Res 1
d Mohamed E -Art and Res mputer and Elec 014
ey Paper on R ournal of Adv ol. 6, No. 8, 20
ce job nd 20
adoop cture. huge ssing. b will ovides break r. By umber
m by urther
te the ery is orking
vel of educe L and d into
e Data olume
d data ACM,
lenges Table,
adoop search
E. El-search ctrical
Recent vanced
[7] P
[8] H
Auth
intern Big D langu Resea
Pig Latin: A N Christopher Yahoo! Rese
Hive – A Peta AshishThuso Prasad Chak Raghotham M
hors Profile
D S M w S m p national journals Data Analytics, M uage. He has more arch Experience.
Not-So-Foreign Olstonכ Yaho earch UtkarshSr
abyte Scale D oo, JoydeepSen kka, Ning Zhan Murthy, Facebo
VI PRO
Dr. Ujjwal Agar Science), M.Ph MSc(Information working in Salal Sultanate of Oma member in man published more t and conferences. Machine Learning
e then 12 years o
Language for oo! Research B rivastava ‡ Yah
Data Warehouse nSarma, Namit ng, Suresh Anto
ook Data Infras
OFILE
rwal completed hil. (Computer n Technology).
ah College of T an as a Lecturer ny International
than 12 research . His main resear g using Python a of teaching experi
Data Processin Benjamin Reed hoo! Research
e Using Hado Jain, Zheng Sh ony, Hao Liu a structure Team
his PhD(Compu r Science) a He is curren Technology, Salal (IT) alogwith he Journals. He h papers in repu rch work focuses and R Programm ience and 6 years
ng, d †
op, hao, and