• No results found

Dashboard Engine for Hadoop

N/A
N/A
Protected

Academic year: 2021

Share "Dashboard Engine for Hadoop"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Dashboard Engine for Hadoop

June 2015

Matt McDevitt

Sr. Project Manager

Pavan Challa

Sr. Data Engineer

(2)

CONFIDENTIAL | 2

Agenda

Think Big Overview

Engagement Model

Solution Offerings

Dashboard Engine

Demo

Q&A

2 © 2015 Think Big, a Teradata Company

(3)
(4)

CONFIDENTIAL | 4

Founded in 2010, acquired in 2014,

International in 2015

First and leading professional services firm

exclusively focused on big data

End to End Services: Strategy, Design,

Implementation, IP/Software, Support and

Managed Services

Academy to scale delivery capability

Extend and integrate open source with UDA

Team-based delivery with Solution Center

Growing quickly:

we’re hiring!

Think Big Overview

Think Big Founded 2010

4

PRESTO

(5)
(6)

CONFIDENTIAL | 6

Big Data

Program Mgt

Business

Analytics

Managed

Services

Data

Engineering

Think Big Analytics

VELOCITY

Methodology

• Solutions

• Planning and Design

• Prioritization

• Capability Backlog

• Grooming for engineering

• Engineering

• Sprint(s)

• Releases

• Quality Assurance & Test

• Managed Support

• Break Fix

• Sustaining Engineering

• New Models

• New Analytics

• New Insights

• New Data Requirements

• New Data

• Big Data Approach

• Use Cases

• Roadmap

Data Science

Discovery

R&D

Big Data Lab

6 © 2015 Think Big, a Teradata Company

(7)

1.

Big Data Strategy Roadmap

2.

Data Lake Starter Program

3.

Data Lake Optimization

4.

Data Lake Managed Services

5.

Presto for the Enterprise – new as of June 10, 2015

6.

Big Data Managed Services

7.

Think Big Academy

Think Big Solution Offerings

• Device Data Manufacturing Operations

• Omni-Channel Marketing Analytics

• Financial Services Fraud/Risk Analytics

• Healthcare personalization

Custom Analytics Solution Services

• Device Data Behavior Analytics

• IT Threat Detection

• Public Sector Risk Analysis

• Gaming Analytics

(8)

MAKING BIG DATA COME ALIVE

MAKING BIG DATA COME ALIVE

(9)

Data Lake: Starter Program

Stand up a Data Lake and build 3 governed batch data ingest streams

Includes Services and Subscription Software Frameworks

Data Lake: Optimization

Add governance to your Data Lake

For Data Lakes not originally built by Think Big

Data Lake: Dashboard Engine Reporting

Install and configure engine with Data Lake to build dashboard analytics for

deep dimensional rollup reporting capabilities with Tableau on Hadoop

Data Lake: Security

Data Security & InfoSec, Cluster Hardening, Perimeter, Connectivity

Data Lake: Managed Services

Only for Data Lakes that Think Big Designs and Builds

On Premise, Public Cloud (AWS) and Private Cloud (Teradata and Altiscale)

(10)

CONFIDENTIAL | 10

Design

Build & Test

Integrate & Tune

Assess, Mentor & Plan

• Collaborative workshops with

business groups

• Identification and prioritization

of high-value data streams

• Gap analysis

• Develop Ingest

workflows

• Install Metadata and

Info Security Services

• Prepare Cluster for

Integration test

• Install Ingest & System

Test

• Begin Profiling Data

• Learn about Information

Security and data wrangling

• Begin Building DL Reporting

• Final tuning, assessment and

next steps

Think Big Data Lake Starter Program

(8 Week Engagement)

Develop & Unit

Testing

Data Stream

Prioritization

Info Security

Objectives

Data Profiling

and Capability

Follow-up

Roadmap

2 weeks

2 week

2 week

2 weeks

Executive

Presentation

Objective: Design, Develop and Deploy Data Lake Ingestion with Governance

Software

Component

Installation

Data

Sources

Organization &

Training

Cluster

configuration &

Integration

System

Integration

Testing

10 © 2015 Think Big, a Teradata Company

(11)

Enterprise Data Lake

Information Sources

Evaluate

Source Data

Ingest

Collect & Manage

Metadata

Apply

Structure

Sequence

Compress

Automate

Protect

Prepare Data

for Ingest

Prepare Source

Metadata

Perimeter-Authentication-Authorization

InfoSec

Downstream

Applications

Dashboard

Engine

(12)

CONFIDENTIAL | 12

Data Lab

Data Repository

Security, Archival

RainStor – System of Record,

Archive

Governed Ingestion

CDC

Buffer Server

Spark

Msg Queue

Kafka

Experimental Data

Raw

Data

Processing

Derived

Views

Loom – integrated Metadata, lineage,

Wrangling

Metadata Repository

Dashboard Engine

API

Realtime

Processing

API

Discovery Zone

Statistics

Machine Learning

Graph

Analytics

12 © 2015 Think Big, a Teradata Company

(13)
(14)

CONFIDENTIAL | 14

Why a Dashboard Engine?

14

Events

Hadoop

(15)

Near real-time analytics

Easily scales to 100s of simulaneous users

Query latency typically under 100 ms

Deep dimensional drill-down

Works with popular BI tools

javascript, jquery

Tableau

others announced soon

(16)

CONFIDENTIAL | 16

Using Tableau without Dashboard Engine

Hadoop

Middle

Tier Server

Extract

• Queryable data limited by

size of Server.

• Doesn’t scale as users grow.

16 © 2015 Think Big, a Teradata Company

(17)

For the time the query is running, most or all of the cluster is dedicated

to that one query.

Has limitations if the cluster has other loads

Has limitations for simultaneous dashboard users

Low latencies possible only if all the event data is in RAM at query

time.

(18)

18

(19)

Uses the power of Apache Spark to pre-aggregate data

Scales as event volume grows.

Scales as number of users grows.

Think Big’s Dashboard Engine for Hadoop

(20)

CONFIDENTIAL | 20

Store cube data

Arr

iv

al

s

-s

:CA

-2014

-01

-04

A

rr

iv

al

s

-s

:CA

-2014

-01

-03

2053

1911

1965

14147

14158

14269

A

rr

iv

al

s

-a:S

F

O

-s

:CA

-2014

-01

-02

A

rr

iv

al

s

-a:S

F

O

-s

:CA

-2014

-01

-03

A

rr

iv

al

s

-a:S

F

O

-s

:CA

-2014

-01

-04

429

479

433

A

rr

iv

al

s

-s

:CA

-2014

-01

-02

A

rr

iv

al

s

-2014

-01

-02

A

rr

iv

a

ls

-2014

-01

-03

A

rr

iv

al

s

-2014

-01

-04

(21)

Aggregate API that understands metrics, dimensions, time ranges.

Relational API that understands (some) SQL.

API - Connecting to the Dashboard Engine

Aggregate API

(22)

22

(23)

Running on a 16-node cluster (TD Appliance for Hadoop)

Process and store all data in ~ 2 hours

Flight Data Statistics for Demo

Rows

Storage space

Flight records

160 million

30 GB

MOLAP cube

35 billion

2.1 TB

(24)

CONFIDENTIAL | 24

Sends SQL queries to the API

SQL Query to REST API Example

SELECT

FlightData.Date

AS

"none_Date_ok",

FlightData.State

AS

"none_State_nk”,

SUM

(FlightData.Arrivals)

AS

"sum_Arrivals_nk”

FROM

"default"

.

"FlightData" "FlightData"

GROUP BY

"none_Date_ok” , "none_State_nk”

Translated to Aggregate API queries

http://10.25.12.241:52080/clickstream/aggregate/v1/?

period=day&start=1970-01-01

&

dimension=State:

&

metric=Arrivals

(25)

<index

name

=

"AirportsByState"

>

<periods>

<period>

day

</period>

</periods>

<indexDimensions>

<dimension

name

=

"State"

/>

</indexDimensions>

<listDimensions>

<dimension

name

=

"Airport"

/>

</listDimensions>

</index>

(26)

CONFIDENTIAL | 26

Aggregate use: Show arrivals for all airports for NY

© 2015 Think Big, a Teradata Company 26

http://10.25.12.241:52080/clickstream/aggregate/v1/?period=da

y&start=2014-01-04&end=2014-01-05&

dimension=Airport:&dimension=State:NY

&metric=Arrivals&head

ers=on

Day Start

Airport State Arrivals

2014-01-04 ALB

NY

20

2014-01-04 ART

NY

1

2014-01-04 BUF

NY

40

...

2014-01-04 JFK

NY

167

2014-01-04 LGA

NY

206

2014-01-04 ROC

NY

17

2014-01-04 SWF

NY

2

2014-01-04 SYR

NY

14

(27)

<index

name

=

"ListFlightNoCarrierCityState"

>

<periods>

<period>

day

</period>

</periods>

<indexDimensions>

</indexDimensions>

<listDimensions>

<dimension

name

=

"State"

/>

<dimension

name

=

"City"

/>

<dimension

name

=

"Carrier"

/>

<dimension

name

=

"FlightNo"

/>

</listDimensions>

</index>

(28)

CONFIDENTIAL | 28

Dimensions use: Show all Flight/Carrier/City/State

© 2015 Think Big, a Teradata Company 28

http://10.25.12.241:52080/clickstream/dimensions/v1/?period

=day&start=2014-01-04&end=2014-01-05&

dimension=State:

&

dimension=City:

&

dimension=Carrier:

&

dime

nsion=FlightNo:

"results":[

["AK","Anchorage, AK","AS","101"],

["AK","Anchorage, AK","AS","102"],

["AK","Anchorage, AK","AS","103"],

["AK","Anchorage, AK","AS","106"],

["AK","Anchorage, AK","AS","108"],

...

["AL","Huntsville, AL","DL","1782"],

["AL","Huntsville, AL","DL","2077"],

...

(29)

<index

name

=

"ListFlightNoByCarrierState"

>

<periods>

<period>

day

</period>

</periods>

<indexDimensions>

<dimension

name

=

"State"

/>

<dimension

name

=

"Carrier"

/>

</indexDimensions>

<listDimensions>

<dimension

name

=

"FlightNo"

/>

</listDimensions>

</index>

Index Question

Q: Drill down to a list of flights that had caused delay in Colorado done by Delta?

A: Create the index below, rerun index creation step, query delay metrics for

(30)

30

(31)

DATA ANALYTICS

DATA ENGINEERS

DATA SOLUTIONS

We are hiring!!!

References

Related documents

The experimental results show that our proposed algorithm is: (a) Highly scalable both with increments in number of cores and in the size of the dataset, (b) Com- putationally

 Drain condensate from HPH 5&amp;6 are connected to storage tank through level control stations with pneumatic level control valve and a motor driven bypass valve. 

Analytics Engine Listing Engine Listing Cockpit Vendor Tool Shopper Terminal Coupon Code scan &amp; payment Order Fulfillment System Delivery Dashboard. • Supplier + Retailer

This linking allows statistical model checking of several opportunistic network properties and

A degree of flow is required to create a high quality laminate and ensure secondary elements such as core and surface gelcoats are well bonded to the prepreg.. However, excessive

Actually, I didn’t come here to talk about my stupid friend, I came because you are so damn sexy, and I am going to get to know you… so tell me something about yourself When you

Kentucky Rule 1.15 (d) provides Ò A lawyer may deposit funds in an account for the limited purpose of minimizing bank charges.Ó I doubt if this provision was imple- mented with

In making a stroke, the player must not anchor the club, either “directly” or by use of an “anchor point.” Note 1:  The club is anchored “directly” when the