Copy rig ht © SA S Institute Inc. A ll rig hts re served.
2
What Is Hadoop?
What Is Hadoop?
The Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Copy rig ht © SA S Institute Inc. A ll rig hts re served.
4
Core Hadoop Modules
Core Hadoop modules include the following choices: HDFS
(Hadoop Distributed File System)
A file system that distributes large files across the Hadoop cluster of computers
Hadoop YARN A framework for job scheduling and
cluster resource management Hadoop MapReduce A YARN-based system for parallel
HDFS: Hadoop Distributed File System
Copy rig ht © SA S Institute Inc. A ll rig hts re served.
6
Using HDFS Commands and Files
Using HDFS Commands and Files
This HDFS command moves a local file into the HDFS cluster:
Copy rig ht © SA S Institute Inc. A ll rig hts re served.
8
Base SAS Interfaces for Hadoop
Tool Purpose
FILENAME statement
Enables the DATA step to read and write HDFS data files. PROC
HADOOP
• Copy or move files between SAS and Hadoop.
• Execute Hadoop file system commands to manage files and directories.
Copy rig ht © SA S Institute Inc. A ll rig hts re served.
10
SAS/ACCESS Interface to Hadoop
Tool Purpose
SQL pass-through
• Submit HiveQL queries and other HiveQL statements from SAS directly to Hive for Hive processing.
• Query results are returned to SAS. LIBNAME
statement for Hadoop
• Use the SAS programming language to access Hive tables as SAS data sets.
Additional SAS Technologies for Hadoop
Hadoop is also one of the file storage systems that SAS uses for SAS In-Memory Analytics product solutions.
• SAS High-Performance Analytics products
• SAS Visual Analytics
• SAS Visual Statistics
Copy rig ht © SA S Institute Inc. A ll rig hts re served.
12
Base SAS: FILENAME for Hadoop and PROC HADOOP
SAS metadata server
SAS workspace server
SAS/ACCESS: SQL Pass-Through and LIBNAME Statement
Copy rig ht © SA S Institute Inc. A ll rig hts re served.
14
SAS In-Memory Analytics Architecture for Hadoop
SAS In-Memory Interfaces for Hadoop
Interface Purpose Product
High-Performance Analytics Procedures
Perform complex analytical computations on Hadoop tables within the data nodes of the Hadoop distribution via SAS procedure language. HPDS2 allows for manipulation of the data structure (column derivation).
SAS High-Performance Analytics Solutions
SAS Visual Analytics and SAS Visual Statistics
Web interfaces to generate graphical visualizations of data distributions,
relationships, and analytical reports on Hadoop tables that are pre-loaded into memory within the data nodes of the Hadoop distribution.
SAS Visual Analytics and SAS Visual Statistics
Copy rig ht © SA S Institute Inc. A ll rig hts re served.
16
SAS In-Memory Interfaces for Hadoop
Interface Purpose Product
High-Performance Analytics Procedures
Perform complex analytical computations on Hadoop tables within the data nodes of the Hadoop distribution via SAS procedure language.
SAS High-Performance Analytics Solutions
WEB Browser Web interfaces to generate graphical
visualizations of data distributions,
relationships, and analytical reports on Hadoop tables that are pre-loaded into memory within the data nodes of the Hadoop distribution.
SAS Visual Analytics and SAS Visual Statistics
PROC IMSTAT,
PROC LASR, and several other procedures and global statements
A programming interface to perform complex analytical calculations on Hadoop tables that are pre-loaded into memory within the data nodes of the Hadoop distribution.
SAS In-Memory Statistics
Copy rig ht © SA S Institute Inc. A ll rig hts re served. 18
In-Memory Analytics
SAS Metadata Server SAS Workspace Server SAS In-Memory Analytics Worker Node SAS In-Memory Analytics Worker Node SAS In-Memory Analytics Worker Node Hadoop DataNode 1 Hadoop DataNode 2 Hadoop DataNode 3 SAS Client Hadoop NameNode Hive SAS In-Memory Analytics Root NodeA SAS process in the root node
SAS In-Memory Analytics Worker Node SAS In-Memory Analytics Worker Node SAS In-Memory Analytics Worker Node
In-Memory Analytics
SAS processes in each HDFS data node execute in parallel.
Copy rig ht © SA S Institute Inc. A ll rig hts re served.
20
These SAS High-Performance Analytics products use a SAS High-Performance grid:
• Statistics • Data Mining • Text Mining
These products use a SAS LASR Analytic Server grid:
• Visual Analytics • Visual Statistics • In-Memory Statistics
SAS Technologies That Use In-Memory Analytics Grids
• Econometrics • Forecasting • Optimization