BIG DATA
BIG DATA
Prepared By
Prepared By
Vinay Krishna Medishetty
Vinay Krishna Medishetty
Under the guidance of
Under the guidance of
Richey
Introduction
Introduction
Big Data may well Big Data may well e the !e"t Big #hing in the $#e the !e"t Big #hing in the $# world%
world%
Big data urst upon the scene in the &rst decade ofBig data urst upon the scene in the &rst decade of the '(
the '(st cest centuryntury%%
#he &rst organi)ations to emrace it were online and #he &rst organi)ations to emrace it were online and
startup &rms% F
startup &rms% Firms like *oogle+ eBay+ irms like *oogle+ eBay+ ,inked$n+ and,inked$n+ and Faceook were uilt around ig data from the
Faceook were uilt around ig data from the eginning%
eginning%
,ike many new information technologies+ ig data can,ike many new information technologies+ ig data can ring aout dramatic cost reductions+ sustantial
ring aout dramatic cost reductions+ sustantial impro-ements in the time re.uired to perform a impro-ements in the time re.uired to perform a
computing task+ or new product and ser-ice o/erings% computing task+ or new product and ser-ice o/erings%
What is BIG DATA?
‘Big Data’ is similar to ‘small data’, but bigger in
size
but having data bigger it requires different
approaches:
Techniques, tools and architecture
an aim to solve new problems or old problems in a
better way
Big Data generates value from the storage and
processing of very large quantities of digital
information that cannot be analyzed with
traditional computing techniques
What is BIG DATA
0al1Mart handles more than ( million customer
transactions e-ery hour%
2 Faceook handles 34 illion photos from its user
ase%
2 Decoding the human genome originally took (4years
to process5 now it can e achie-ed in one week%
Three Characteristics of Big Data V3s
V
o
l
u
m
e
•D
a
t
a
.
u
a
n
t
i
t
y
V
e
l
o
c
i
t
y
•D
a
t
a
6
p
e
e
d
V
a
r
i
e
t
y
•D
a
t
a
#
y
p
e
s
1
stCharacter of Big Data
Volume
•
7 typical P8 might ha-e had (4 gigaytes of storage
in '444%
•
#oday+ Faceook ingests 944 teraytes of new data
e-ery day%
•
Boeing :;: will generate '34 teraytes of <ight data
during a single <ight across the U6%
•
#he smart phones+ the data they create and
consume5 sensors emedded into e-eryday o=ects will
soon result in illions of new+ constantly1updated data
feeds containing en-ironmental+ location+ and other
2nd Character of Big Data
Velocity
8lick streams and ad impressions capture user
eha-ior at millions of e-ents per second
high1fre.uency stock trading algorithms re<ect
market changes within microseconds
machine to machine processes e"change data
etween illions of de-ices
infrastructure and sensors generate massi-e log data
in real1time
on1line gaming systems support millions of concurrent
users+ each producing multiple inputs per second%
3rd Character of Big Data
Variety
Big Data isn>t =ust numers+ dates+ and strings%
Big Data is also geospatial data+ ;D data+
audio and -ideo+ and unstructured te"t+
including log &les and social media%
#raditional dataase systems were designed to
address smaller -olumes of structured data+
fewer updates or a predictale+ consistent data
structure%
Big Data analysis includes di/erent types of
toring Big Data
Analy!ing your data characteristics
6electing data sources for analysis
?liminating redundant data
?stalishing the role of !o6@,
"#er#ie$ of Big Data stores
Data modelsA key -alue+ graph+ document+
column1family
adoop Distriuted File 6ystem
Base
%rocessing Big Data
Integrating dis&arate data stores
Mapping data to the programming framework
8onnecting and e"tracting data from storage
#ransforming data for processing
6udi-iding data in preparation for adoop Map
Reduce
'm&loying (adoo& )a& *educe
8reating the components of adoop Map Reduce
=os
Distriuting data processing across ser-er farms
?"ecuting adoop Map Reduce =os
Data
6tructured
•
Most traditional
data sources
6emi1structured
•
Many sources of ig
data
Unstructured
•
Video data+ audio
Why Big Data
•
!rowth of Big Data is needed
–
"ncrease of storage capacities
–"ncrease of processing power
–
#vailability of data$different data types%
–
&very day we create '( quintillion bytes of data) *+
of the data in the world today has been created in
the last two years alone
Why Big Data
B generates (4#B daily
itter generates :#B of data
aily
M claims C4 of todayEs
tored data was generated
Big Data Analytics
&-amining large amount of data
#ppropriate information
"dentification of hidden patterns, un.nown correlations
/ompetitive advantage
Better business decisions: strategic and operational
&ffective mar.eting, customer satisfaction, increased revenue
Ty&es of tools used in
Big+Data
0here processing is
hosted
Distriuted 6er-ers G 8loud He%g% 7ma)on ?8'I
0here data is
stored
Distriuted 6torage He%g% 7ma)on 6;I
0hat is the
&rogramming model
Distriuted Processing He%g% MapReduceI
ow data is
stored , inde-ed
igh1performance schema1free dataases He%g%
MongoDBI
0hat operations are performed on data
analytics
(omeland ecurity marter (ealthcar e )ulti+ channel sales Telecom )anufacturin g Tra.c Control Trading Analytics earch /uality*is0s of Big Data
•
0ill e so o-erwhelmed
•
!eed the right people and sol-e the right prolems
•
8osts escalate too fast
•
$snEt necessary to capture (44
•
Many sources of ig data
is pri-acy
•
self1regulation
•,egal regulation
(o$ Big data im&acts
on IT
•
Big data is a troulesome force presenting
opportunities with challenges to $#
organi)ations%
By '4(9 3%3 million $# =os in Big Data 5 (%C
million is in U6 itself
$ndia will re.uire a minimum of ( lakh data
scientists in the ne"t couple of years in
addition to data analysts and data managers
to support the Big Data space%
Benets of Big Data
•
Real1time ig data isnEt =ust a process for
storing peta ytes or e"a ytes of data in a
data warehouse+ $tEs aout the aility to make
etter decisions and take meaningful actions at
the right time%
•
Fast forward to the present and technologies
like adoop gi-e you the scale and <e"iility to
store data efore you know how you are going
to process it%
•
#echnologies such as Map Reduce+i-e and
Benets of Big Data
Jur newest research &nds that organi)ations are using
ig data to target customer1centric outcomes+ tap into internal data and uild a etter information ecosystem%
Big Data is already an important part of the L3 illion
dataase and data analytics market
$t o/ers commercial opportunities of a comparale
scale to enterprise software in the late (C4s
7nd the $nternet oom of the (CC4s+ and the social media
uture of Big Data
(9 illion on software &rms only
speciali)ing in data management and
analytics%
#his industry on its own is worth more
than (44 illion and growing at almost
(4 a year which is roughly twice as fast
as the software usiness as a whole%