• No results found

QUEST meeting Big Data Analytics

N/A
N/A
Protected

Academic year: 2021

Share "QUEST meeting Big Data Analytics"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

Copyright © 2015, SAS Institute Inc. All rights reserved.

QUEST meeting –

Big Data Analytics

Peter Hughes

Business Solutions Consultant

SAS Australia/New Zealand

(2)

Copyright © 2014, SAS Institute Inc. All rights reserved.

Big Data

Analytics

WHERE WE ARE NOW

2005 2007 2009 2011 2013

ANALYTICS

BIG DATA

HADOOP

Lots of data

Processing

Power

Accurate

/Decisions

(3)

C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

"Big data is what happened when

the

cost of storing information

became less than the cost of making

the decision to throw it away.

- George Dyson

Science Historian and TED Speaker

(4)

C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

Discovery-centric

Everything is

permitted unless it is

forbidden

Focus on value

Technology empowered

(5)

Copyright © 2014, SAS Institute Inc. All rights reserved.

WHAT IS HADOOP ?

An Apache Software Foundation project

Open-source

Origins in early 2000s with contributions from Google, Yahoo! and Facebook

Framework of tools for processing Big Data

1.

Base: Common, Distributed File System (HDFS); MapReduce & YARN

2.

Additional projects including: Pig; Hive; HBase; Pig; Zookeeper et al.

Designed for clusters using commodity server hardware typically Intel/Linux

Distributed storage

Distributed processing

Fault-tolerant topology

Commercial Hadoop distributions based on Apache code

Extensions; additional tooling; support

(6)

Copyright © 2014, SAS Institute Inc. All rights reserved.

COMMERCIAL HADOOP VENDORS

Intel recently invested $740 Million

to buy 18%. Puts their value at

around the $4 Billion mark!

HP recently invested $50 Million to into

Hortonworks to get a place on the board.

Total investment now about $300 Million.

Big Teradata and SAP Partners!

Google Capital recently invested

$80 Million to into MapR – they

gathered $110 million of

investment in their last round!

IBM InfoSphere BigInsights

Pivotal HD

GE invested $105 Million In

Pivotal

(7)

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS and Hadoop

INTEGRATION WITH OPEN SOURCE HADOOP

HDFS MapReduce YARN PIG HIVE Impala Sqoop Parquet Hcatalog ORC Oozie Spark

(8)

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS

®

WITHIN THE HADOOP ECOSYSTEM

Next-Gen

SAS

®

User

User Interface Metadata Data Access Data Processing File System

SAS

®

User

MPI Based

SAS

®

LASR™ Analytic

Server

SAS

®

High-Performance

Analytic Procedures

HDFS

Base SAS & SAS/ACCESS

®

to Hadoop™

SAS Metadata

Pig

Map Reduce/YARN

In-Memory

Data Access

SAS

®

Visual

Analytics

SAS

®

Enterprise

Miner™

SAS

®

Data

Integration

SAS

®

Data

Loader for

Hadoop

Hive

SAS Embedded

Process Accelerators

SAS

®

In-Memory

Statistics for

Hadoop

(9)

C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

DATA TO DECISION LIFECYCLE on Hadoop

TEXT

MANAGE

DATA

E

X

P

L

O

R

E

D

A

T

A

DEVELOP

MODELS

D

E

P

L

O

Y

&

M

O

N

IT

O

R

• SAS/ACCESS (Hadoop/Impala)

• SAS Data Management

• SAS Federation Server • SAS Data Quality Accelerator for

Hadoop

• SAS Code Accelerator for Hadoop

• SAS Data Loader for Hadoop • SAS Visual Analytics

• SAS In-memory Statistics for Hadoop

• SAS HPA Products

SAS Visual Statistics

• SAS In-memory Statistics for Hadoop

• Model Manager

• SAS Scoring Accelerator for Hadoop

(10)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

MANAGE DATA

READ/WRITE TO HDFS

/*

Create directory on HDFS

*/

filename cfg "C:\Sample_Data\hadoop_config.xml";

proc hadoop options=cfg username="hadoop" password="hadoop"; hdfs mkdir="/user/hadoop/testfolder" ;

run;

/*

Copy file from local SAS to HDFS */

filename cfg "C:\Sample_Data\hadoop_config.xml";

proc hadoop options=cfg username="hadoop" password="hadoop"; hdfs copyfromlocal="C:\Sample_data\dept.txt"

out="/user/hadoop/testfolder/"; run;

/*

Copy file from HDFS to local SAS */

filename cfg "C:\Sample_Data\hadoop_config.xml";

proc hadoop options=cfg username="hadoop" password="hadoop";

hdfs copytolocal="/user/hadoop/testfolder" out="C:\Sample_data\" ; run;

Hadoop configuration file, used for all PROC HADOOP PIG|MAPREDUCE|HDFS calls

fi le :/ // C :/ S a m p le _ d a ta /h a d o o p _ co n fi g .x m l#

(11)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

MANAGE DATA

SAS/ACCESS

Base SAS Procedures executed in-database for Hadoop

FREQ, REPORT, SORT, SUMMARY/MEANS, TABULATE

Supported Hadoop distributions & combinations*

Cloudera CDH 5.0 running Hive/Hive2

Hortonworks HDP 2.0 running HiveServer2

IBM InfoSphere BigInsights 2.1 running Hive

MapR M5 2.0.1 running Hive

Pivotal/Greenplum HD running Hive

Pivotal/Greenplum MR 2.0.1 running Hive

* If a provider assures upward compatibility, SAS/ACCESS supports newer combinations. For example, Cloudera assures upward compatibility within major releases, so Cloudera CDH4.2 running Hive or HiveServer2 is supported.

(12)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

MANAGE DATA

HIVE

LIBNAME cdh_hdp HADOOP PORT=10000 SERVER=sascldserv02 user=hadoop password=hadoop ;

/*

Create new table */

proc sql;

connect to hadoop(PORT=10000 SERVER=sascldserv02 USER=hadoop PASSWORD="hadoop");

exec( create table cars_prc (make string, model string, msrp double) ) by hadoop;

quit;

/*

Copy from another table */

proc sql;

insert into cdh_hdp.cars_prc select make, model, msrp from sashelp.cars ; quit; /* List contents */ proc sql;

select * from cdh_hdp.cars_prc; quit;

(13)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

MANAGE DATA

MAPREDUCE

/*

Invoke MapReduce Word Count program */

filename cfg "C:\Sample_Data\hadoop_config.xml";

proc hadoop options=cfg username="hadoop" password="hadoop" verbose; hdfs delete="/user/hadoop/output_MR1";

mapreduce

input="/user/hadoop/gutenberg“ output="/user/hadoop/output_MR1"

jar="C:\Sample_data\hadoop-examples-2.0.0-mr1-cdh4.1.2.jar" outputkey="org.apache.hadoop.io.Text" outputvalue="org.apache.hadoop.io.IntWritable" reduce="org.apache.hadoop.examples.WordCount$IntSumReducer" combine="org.apache.hadoop.examples.WordCount$IntSumReducer" map="org.apache.hadoop.examples.WordCount$TokenizerMapper" reducetasks=0 ; run;

(14)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

MANAGE DATA

SAS DATA INTEGRATION STUDIO

Seamless access to Hadoop data

(HDFS/HIVE/IMPALA) by

analyst/traditional SAS users

Reading & writing to/from HDFS

Transfer to/from Hadoop operators

Support for Pig, Hive & MapReduce

(15)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

SAS

®

LASR ANALYTIC SERVER AND HADOOP

SAS

®

LASR ANALYTIC

SERVER

SAS®IN-MEMORY SAS®IN-MEMORY SAS®IN-MEMORY SAS®IN-MEMORY SAS®IN-MEMORY

HADOOP

WEB CLIENTS

APPLICATIONS

ERP

SCM CRM Images Audio and Video Machine Logs Text

f

Web and Social

In-memory processing; use Hadoop for storage persistence and commodity computing

SAS

®

IN-MEMORY

ANALYTICS

SAS Visual Analytics

SAS Visual Statistics

SAS In-Memory

Statistics for Hadoop

(16)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

DEPLOY & MONITOR

SAS SCORING ACCELERATOR FOR HADOOP

Publish SAS® Enterprise Miner™ models or SAS/STAT linear

models inside the Hadoop

Fully integrated with SAS® Model Manager to streamline

registration, validation and performance monitoring

Reduced data movement and improve data governance by

streamlining model deployment processes within Hadoop

(17)
(18)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

(19)

C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

[email protected]

peter hughes

Thank You!

References

Related documents

Comparatively lower pupal weights were recorded in larvae reared on fresh leaves/pods and on artificial diets with leaf and pod powder of ICC 12476, ICC 12477, ICC 12478, and ICC

In recent years, a subset of early-stage Internet companies (companies whose primary product is a website or Internet application) have been following different principles –

3 Take the relevant details, like No of filters of area, filter size, room volume, AHU No, design CMH/CFM and ACPH for each room.. 4 Switch `ON’ the calibrated hot

For desktop applications such as SAS Information Map Studio, SAS Enterprise Guide, SAS Data Integration Studio, SAS OLAP Cube Studio, and SAS Management Console, you can use

Server Tier SAS® DI Studio SAS® Enterprise Guide 3.0 SAS® Management Console SAS® OLAP Cube Studio SAS® Enterprise Miner 5 SAS® Information Map Studio SAS® XML Mapper

Therefore, this study was carried out to assess the impact of IWSM technologies on crop and livestock production and evaluate its contribution to household annual

TERADATA APPLIANCE FOR SAS HIGH-PERFORMANCE ANALYTICS, MODEL 720 The Teradata® Appliance for SAS High-Performance Analytics, Model 720 is specifically for SAS High-

This project is designed to detect unauthorized connection from the transmission line by using power analyzers that measures the current from main power line to