Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

(1)

Hunk & Elas=c MapReduce:

Big Data Analy=cs on AWS

Dritan Bi=ncka

(2)

Disclaimer

2

During the course of this presenta=on, we may make forward looking statements regarding future events or the expected performance of the company. We cau=on you that such statements reﬂect our current expecta=ons and

es=mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in the this presenta=on are being made as

of the =me and date of its live presenta=on. If reviewed aTer its live presenta=on, this presenta=on may not contain current or accurate informa=on. We do not assume any obliga=on to update any forward looking statements we may make. In addi=on, any informa=on about our roadmap outlines our general product direc=on and is subject to change

at any =me without no=ce. It is for informa=onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga=on either to develop the features or func=onality described or to

(3)

About Me

!  

Member of BD Solu=on Architecture team

!  

Large scale deployments

!  

Cloud and Big Data

(4)

Agenda

!  

Hunk

!  

Amazon EMR

!  

Understanding how Hunk and EMR can work together

!  

Demo

–  Analyzing HDFS/S3 data with Hunk on EMR

(5)

Introduc=on

to Hunk

(6)

6

Splunk as a single pane of

glass for your machine data

(7)

(8)

8

RDBM

Splunk>

NoSQL RDBM NoSQL Splunk>

(9)

Hunk for Hadoop and NoSQL Data Stores

Explore

Analyze

Visualize

RDBM Splunk> NoSQL

(10)

Hunk for Hadoop and NoSQL Data Stores

10

Explore

Analyze

Visualize

RDBM Splunk> NoSQL

(11)

Hadoop Components

HDFS

–  NameNode

–  DataNode

–  Distributed, replicated, massively scalable ﬁle system

MapReduce

–  JobTracker

–  TaskTracker

–  Programming paradigm; two phase processing of large datasets

ê

 

We also use it, though a simpliﬁed version of it

–  Scalable, fault tolerant etc.

COMPUTE

(12)

Splunk and Hadoop Data

Export:

Write data out to Hadoop,

search based (push)

Explore:

Read data from Hadoop and

analyze on SH

12

(13)

Splunk and Hadoop Data

Export:

Write data out to Hadoop,

search based (push)

Explore:

Read data from Hadoop and

analyze on SH

Splunk Hadoop Connect

(14)

Splunk and Hadoop Data

Export:

Write data out to Hadoop,

search based (push)

Explore:

Read data from Hadoop and

analyze on SH

14

STORAGE

Splunk Hadoop Connect

PULL

✓

✗

(15)

Splunk and Hadoop Data – Today

COMPUTE

STORAGE

Explore

Visualize Dashboard

s

Share

Analyze

✓

(16)

64-‐bit Linux OS

splunkweb

• Web and Applica=on server

• Python, AJAX, CSS, XSLT, XML

• Search Head

• Virtual Indexes

• C++, Web Services

REST API COMMAND LINE

Explore Analyze Visualize Dashboards Share

ODBC

splunkd

Splunk Stack

(17)

64-‐bit Linux OS

splunkweb

• Web and Applica=on server

• Python, AJAX, CSS, XSLT, XML

• Search Head

• Virtual Indexes

• C++, Web Services

ODBC

splunkd

Hadoop Interface

•  Hadoop Client Libraries •  JAVA

(18)

64-‐bit Linux OS

splunkweb

• Web and Applica=on server

• Python, AJAX, CSS, XSLT, XML

• Search Head

• Virtual Indexes

• C++, Web Services

ODBC

splunkd

Hadoop Interface

•  Hadoop Client Libraries •  JAVA

Scaling with Hadoop

18

Connect Hunk to mul=ple Hadoop clusters

Hadoop Cluster 3 Hadoop Cluster 2 Hadoop Cluster 1

(19)

What Makes it Stick?

ERP1 (prod) ERP2 (test)

VIX-‐1 VIX-‐2 VIX-‐3 VIX-‐4

ERP Provider Family

Hadoop In order to access and process data in external data stores

(supports HDFS out-of-the-box), Hunk External Resource Providers (ERP) carry out the store-specific file system implementation and computational semantics.

Provider Family is a logical grouping of data store framework that accesses the same

“kind” of external systems and shares a global set of conﬁgura=ons.

A provider is a collec=on of specific Hunk ERP helper process implementa=on within the provider family and shares a cluster-‐specific configura=ons.

ATer you set up a provider, you conﬁgure virtual indexes (VIX) by giving Hunk informa=on about the data loca=on. Hunk then use the informa=on and its underlying implementa=on to distribute searches.

(20)

Explore, Analyze, Visualize Data in Hadoop

!  

No ﬁxed schema to search unstructured data

!  

Preview results while MapReduce jobs start

!  

Easier app development than in raw Hadoop

20

!  

Unlock business value of data in Hadoop

!  

Fast to learn instead of scarce skills

(21)

Integrated Analy=cs Plaoorm for Hadoop Data

21

Full-‐featured,

Integrated

Product

Insights for

Everyone

Works with

What You

Have Today

Explore

Visualize

Dashboards

Share

Hadoop

(MapReduce

& HDFS)

(22)

(23)

Amazon EMR

!  

Amazon EMR is Hadoop framework in

the cloud oﬀered as a managed service

!  

Used in “variety of applica.ons,

including log analysis, web indexing,

data warehousing, machine learning,

ﬁnancial analysis, scien.ﬁc simula.on,

and bioinforma.cs”

(24)

Provisioning Hadoop on AWS

24

1. Login to AWS Console

2. Fill in a form

3. Click “Create Cluster”

4. Wait a few minutes for

a fully operaYonal

(25)

Why is EMR Compelling?

!  

No Hadoop/HDFS management

!  

NaYve support for AWS S3

–  Vast amounts of data in S3

!  

Cluster Elas=city

!  

Spot vs. Reserved Instances

–  Long running vs. transient

!  

Pay for what you use

!  

Thousands of customers

Master

HDFS

S3

(26)

Managed Hadoop

framework on the

cloud with access to

vast amounts of

data in HDFS and S3

Explore, analyze and

visualize data from

a central place

Full analy=cs

solu=on for Big Data

on the cloud

Integra=ng Hunk with EMR

(27)

Hunk on EMR: Op=on 1

!  

Classic Hunk + Hadoop

–  Provision an EMR cluster

–  Provision a Hunk EC2 instance using the AWS Marketplace Hunk AMI

–  Bring Your Own License (BYOL)

–  Conﬁgure Hunk with EMR cluster

ê

 

Edit Security Groups to allow access

ê

 

Master IP addresses & Ports

ê

 

Create provider

ê

 

Create Virtual Index

(28)

Hunk on EMR: Op=on 2

28

(29)

Demo

!  

Analyze ELB or S3 Access Logs

!  

Analyze CloudTrail Access Logs

(30)

QUESTIONS?

Hunk 6.1 Technical Deep Dive

Hunk Report AcceleraYon Deep Dive

Comprehensive Security AnalyYcs

for Modern Threats with Hunk

(31)

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS