• No results found

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

N/A
N/A
Protected

Academic year: 2021

Share "Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Copyright  ©  2014  Splunk  Inc.  

Hunk  &  Elas=c  MapReduce:  

Big  Data  Analy=cs  on  AWS  

Dritan  Bi=ncka  

(2)

Disclaimer  

2  

During  the  course  of  this  presenta=on,  we  may  make  forward  looking  statements  regarding  future  events  or  the   expected  performance  of  the  company.  We  cau=on  you  that  such  statements  reflect  our  current  expecta=ons  and  

es=mates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For   important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,   please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presenta=on  are  being  made  as  

of  the  =me  and  date  of  its  live  presenta=on.  If  reviewed  aTer  its  live  presenta=on,  this  presenta=on  may  not  contain   current  or  accurate  informa=on.  We  do  not  assume  any  obliga=on  to  update  any  forward  looking  statements  we  may   make.  In  addi=on,  any  informa=on  about  our  roadmap  outlines  our  general  product  direc=on  and  is  subject  to  change  

at  any  =me  without  no=ce.  It  is  for  informa=onal  purposes  only  and  shall  not,  be  incorporated  into  any  contract  or   other  commitment.  Splunk  undertakes  no  obliga=on  either  to  develop  the  features  or  func=onality  described  or  to  

(3)

About  Me    

 

!  

Member  of  BD  Solu=on  Architecture  team  

!  

Large  scale  deployments  

!  

Cloud  and  Big  Data  

(4)

Agenda  

!  

Hunk  

!  

Amazon  EMR  

!  

Understanding  how  Hunk  and  EMR  can  work  together  

!  

Demo  

–  Analyzing  HDFS/S3  data  with  Hunk  on  EMR  

 

(5)

Introduc=on    

to  Hunk  

(6)

6  

Splunk  as  a  single  pane  of  

glass  for  your  machine  data  

(7)
(8)

8  

RDBM  

Splunk>  

NoSQL   RDBM   NoSQL   Splunk>  

(9)

Hunk  for  Hadoop  and  NoSQL  Data  Stores  

Explore  

Analyze    

Visualize  

RDBM   Splunk>   NoSQL  

(10)

Hunk  for  Hadoop  and  NoSQL  Data  Stores  

10  

Explore  

Analyze    

Visualize  

RDBM   Splunk>   NoSQL  

(11)

Hadoop  Components  

HDFS  

–  NameNode  

–  DataNode    

–  Distributed,  replicated,  massively  scalable  file  system  

MapReduce  

–  JobTracker    

–  TaskTracker  

–  Programming  paradigm;  two  phase  processing  of  large  datasets    

ê

We  also  use  it,  though  a  simplified  version  of  it    

–  Scalable,  fault  tolerant  etc.    

COMPUTE  

(12)

Splunk  and  Hadoop  Data  

Export:  

Write  data  out  to  Hadoop,  

search  based  (push)  

Explore:  

Read  data  from  Hadoop  and  

analyze  on  SH    

12  

(13)

Splunk  and  Hadoop  Data  

Export:  

Write  data  out  to  Hadoop,  

search  based  (push)  

Explore:  

Read  data  from  Hadoop  and  

analyze  on  SH    

Splunk  Hadoop  Connect  

(14)

Splunk  and  Hadoop  Data  

Export:  

Write  data  out  to  Hadoop,  

search  based  (push)  

Explore:  

Read  data  from  Hadoop  and  

analyze  on  SH    

14  

STORAGE  

Splunk  Hadoop  Connect  

PULL  

 

 

(15)

Splunk  and  Hadoop  Data  –  Today  

COMPUTE  

STORAGE  

Explore  

Visualize   Dashboard

s  

Share  

Analyze  

 

(16)

64-­‐bit  Linux  OS  

splunkweb  

• 

Web  and  Applica=on  server  

• 

Python,  AJAX,  CSS,  XSLT,  XML  

• 

Search  Head  

• 

Virtual  Indexes  

• 

C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC    

splunkd

 

Splunk  Stack  

(17)

64-­‐bit  Linux  OS  

splunkweb  

• 

Web  and  Applica=on  server  

• 

Python,  AJAX,  CSS,  XSLT,  XML  

• 

Search  Head  

• 

Virtual  Indexes  

• 

C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC    

splunkd

 

Hadoop  Interface  

•  Hadoop  Client  Libraries   •  JAVA  

(18)

64-­‐bit  Linux  OS  

splunkweb  

• 

Web  and  Applica=on  server  

• 

Python,  AJAX,  CSS,  XSLT,  XML  

• 

Search  Head  

• 

Virtual  Indexes  

• 

C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  

splunkd

 

Hadoop  Interface  

•  Hadoop  Client  Libraries   •  JAVA  

Scaling  with  Hadoop  

18  

Connect  Hunk  to  mul=ple  Hadoop  clusters  

Hadoop  Cluster  3   Hadoop  Cluster  2   Hadoop  Cluster  1  

(19)

What Makes it Stick?

ERP1  (prod)   ERP2  (test)  

VIX-­‐1   VIX-­‐2   VIX-­‐3   VIX-­‐4  

ERP  Provider  Family    

 

 

Hadoop   In order to access and process data in external data stores

(supports HDFS out-of-the-box), Hunk External Resource Providers (ERP) carry out the store-specific file system implementation and computational semantics.

Provider  Family  is  a  logical  grouping  of  data  store  framework  that  accesses  the  same  

“kind”  of  external  systems  and  shares  a  global  set  of  configura=ons.  

A  provider  is  a  collec=on  of  specific  Hunk  ERP  helper  process  implementa=on  within   the  provider  family  and  shares  a  cluster-­‐specific  configura=ons.  

ATer  you  set  up  a  provider,  you  configure  virtual  indexes  (VIX)  by  giving  Hunk   informa=on  about  the  data  loca=on.  Hunk  then  use  the  informa=on  and  its   underlying  implementa=on  to  distribute  searches.  

(20)

Explore,  Analyze,  Visualize  Data  in  Hadoop  

!  

No  fixed  schema  to  search  unstructured  data  

!  

Preview  results  while  MapReduce  jobs  start  

!  

Easier  app  development  than  in  raw  Hadoop  

20  

!  

Unlock  business  value  of  data  in  Hadoop  

!  

Fast  to  learn  instead  of  scarce  skills  

(21)

Integrated  Analy=cs  Plaoorm  for  Hadoop  Data  

21  

Full-­‐featured,  

Integrated  

Product

 

Insights  for  

Everyone

 

Works  with  

What  You  

Have  Today

 

Explore  

Visualize  

Dashboards  

Share  

Hadoop  

(MapReduce  

&  HDFS)  

(22)
(23)

Amazon  EMR  

!  

Amazon  EMR  is  Hadoop  framework  in  

the  cloud  offered  as  a  managed  service  

!  

Used  in  “variety  of  applica.ons,  

including  log  analysis,  web  indexing,  

data  warehousing,  machine  learning,  

financial  analysis,  scien.fic  simula.on,  

and  bioinforma.cs”  

 

(24)

Provisioning  Hadoop  on  AWS  

24  

1. 

Login  to  AWS  Console  

2. 

Fill  in  a  form    

3. 

Click  “Create  Cluster”  

4. 

Wait  a  few  minutes  for  

a  fully  operaYonal  

(25)

Why  is  EMR  Compelling?  

!  

No  Hadoop/HDFS  management    

!  

NaYve  support  for  AWS  S3  

–  Vast  amounts  of  data  in  S3  

!  

Cluster  Elas=city    

!  

Spot  vs.  Reserved  Instances  

–  Long  running  vs.  transient

 

!  

Pay  for  what  you  use  

!  

Thousands  of  customers  

Master  

HDFS  

S3  

(26)

Managed  Hadoop  

framework  on  the  

cloud  with  access  to  

vast  amounts  of  

data  in  HDFS  and  S3  

Explore,  analyze  and  

visualize  data  from  

a  central  place    

Full  analy=cs  

solu=on  for  Big  Data  

on  the  cloud  

Integra=ng  Hunk  with  EMR  

(27)

Hunk  on  EMR:  Op=on  1  

!  

Classic  Hunk  +  Hadoop  

–  Provision  an  EMR  cluster  

–  Provision  a  Hunk  EC2  instance  using  the  AWS  Marketplace  Hunk  AMI  

–  Bring  Your  Own  License  (BYOL)  

–  Configure  Hunk  with  EMR  cluster  

ê

Edit  Security  Groups  to  allow  access  

ê

Master  IP  addresses  &  Ports  

ê

Create  provider  

ê

Create  Virtual  Index  

(28)

Hunk  on  EMR:  Op=on  2  

28  

(29)

Demo  

!  

Analyze  ELB  or  S3  Access  Logs    

!  

Analyze  CloudTrail  Access  Logs    

(30)

Copyright  ©  2014  Splunk  Inc.  

QUESTIONS?  

 

 

You  may  also  like:  

Hunk  6.1  Technical  Deep  Dive  

Hunk  Report  AcceleraYon  Deep  Dive  

Comprehensive  Security  AnalyYcs    

for  Modern  Threats  with  Hunk  

(31)

THANK  YOU  

feedback:  [email protected]  

 

References

Related documents

Scattered along the Central Oregon Coast Range and the beach are some of the most interesting old lumber and fishing towns in the Northwest. Some have new life as historic sites,

comparison, regional emissions from the combustion of fossil fuels were 495, 275, and 770 TgC yr −1 , which are many times higher than the NBP sink estimates, suggesting that

Imaging results of a vertical dry cask (upper row: images using a perfect algorithm, lower row: images using PoCA, left column: fully loaded, center column: half loaded, right

High 96% 88% 56% Clinical Suspicion Probability of Probability of Pulmonary Embolus Pulmonary Embolus V/ Q Pr obabi l High 96% 88% 56% Intermediate 66% 28% 16% Low 40% 16% 4%

Table 6 shows the activities in counts per minute of photoheterotrophic protein fractions eluted from a calf thymus (CT) DNA- cellulose column when incubated

Glavni nalaz ovog istraživanja bilo je postojanje korelacije između BMI-ja i parodontitisa u skupini žena s 8 ili manje godina formalnog obrazovanja starosne skupine od 36

In addition, as small farmers’ market power is hindered by their lack of information on price levels and changes at different points of the marketing chain,

The similarity score was scaled in miles – a pair of houses that is identical on all dimensions would receive a score equal to the distance between the houses, while differences