Enhancing UNICORE Storage Management using Hadoop
Di t ib t d Fil S t
Distributed File System
Wasim Bari2, Ahmed Shiraz Memon1, Dr. Bernd Schuller1
1. Jülich Supercomputing Centre, Forschungszentrum Jülich &
2 i f S i ifi C i h
2. Institute of Scientific Computing, RWTH Aachen
Contents
Contents
Contents – Motivation – Hadoop – UNICORE – UniHadoop - Summary - Q&A
•
Motivation
•
Hadoop
p
•
UNICORE
•
UniHadoop
•
Demonstration
•
Demonstration
•
Summary
Motivation
Motivation
Contents – Motivation – Hadoop – UNICORE – UniHadoop - Summary - Q&A
• Computation power has increased exponentially • More Computation with advanced technologiesp g
• Grid Computing, Parallel computing, High Performance Computing
• Environment Modeling
• High Energy and Nuclear Physics • And many more in “Feasible” time • And many more in Feasible time • Applications are producing HUGE data
• Big Bang, Large Hadron Collider at CERN – 15
Motivation
Motivation
Contents – Motivation – Hadoop – UNICORE – UniHadoop - Summary - Q&A
• Traditional Data Storage Systems cannot store such amount of data
• Limited Storage • Disaster Recovery • Disaster Recovery • Number of Files • Parallel Access
• Distributed Data Storage System
• Elastic expandable Storage • Elastic expandable Storage • Disaster Recovery
• Fast and Parallel Access • Durable
• Replication • Number of Files
Motivation
Motivation
Contents – Motivation – Hadoop – UNICORE – UniHadoop - Summary - Q&A
• UNICORE supports simple Storage mechanisms with the Filesystem located at TargetSystem Siteg y
• Becomes complicated while using complex workflows
• Goal: Allows UNICORE to support Distributed Storage System as big shared storage UNICORE UNICORE Distributed Storage System Client Storage System Simple Storage
Hadoop
Hadoop
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
• In 2003 Proposed a highly scalable, fault tolerant, dynamic Distributed File System called Google File System (GFS).
I 2004 i t d d M R d i d l A
• In 2004, introduced MapReduce programming model, A mechanism to achieve parallelism.
• Apache Hadoop is an Open Source implementation of GFS andp p p p MapReduce
• Main contributor is
• Hadoop Distributed File System (HDFS)
• Scalable • Economical • Economical • Efficient and • Reliable
Hadoop
Hadoop Distributed
Distributed Filesystem
Filesystem (HDFS)
(HDFS)
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
• Very Large Distributed File System
– 10K nodes, 100 million files, 10 PB
• Master/Worker Architecture
– Single Master (NameNode)/Many Workers (DataNode)g ( )/ y ( )
• Single Namespace for entire cluster
• Blocks: Files are broken up into smaller chunks
Typically 128 MB block size – Typically 128 MB block size
– Each block replicated on multiple DataNodes
• Client
Fi d l ti f bl k – Finds location of blocks
– Accesses data directly from DataNode – Security
P i i d l i il t POSIX – Permission model similar to POSIX
HDFS: Reading a file
HDFS: Reading a file
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
Hadoop
Hadoop
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
• Abstract Filesystem concept
Supported File and Storagessystems • Supported File- and Storagessystems
– Hadoop Distributed File System (hdfs://)
– Cloudera ( formerly Kosmos File System) (kfs://)
A S3 Fil S t – Amazon S3 File System
• S3 Native File System (s3n://)
• S3 Block File System (s3://)
– Local disk (file://)
– Local disk (file://)
– FTP File System (ftp://)
R ti l ti f f th t d St t
UNICORE
UNICORE
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
• A Ready-to-Run, Open Source Grid Middleware that enables seamless and secure access to the Computing resources
• UNICORE Atomic Services (UAS)
– Storage Management Service (SMS)Sto age a age e t Se ce (S S)
– File Transfer Service (FTS) – Job Management Service
• Implementation of standards OASIS, OGSA, WS-I etc…
UNICORE Architecture
UNICORE Architecture
UniHadoop
UniHadoop: Overview
: Overview
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
UniHadoop is an integration of UNICORE with Apache Hadoop • UniHadoop is an integration of UNICORE with Apache Hadoop • Inspired by Hadoop design
• Using an Abstract notion of Filesystem • Supports all storage/filesystems
• Specialisation of UNICORE Storage Management Service (SMS).
UniHadoop
UniHadoop: Architecture
: Architecture
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
command -line client Eclipse -based client Portal client, e.g. GridSphere Programming API command -line client Eclipse -based client Portal client, e.g. GridSphere Programming API UNICORE UNICORE Gateway Gateway Hadoop Hadoop UNICORE UNICORE Gateway Gateway Hadoop Hadoop UNICORE XNJS UNICORE Atomic Services OGSA - * UNICORE XNJS UNICORE Atomic Services OGSA - * UNICORE WS - RF Service Registry CIS Information Service SMS FTS SMS FTS UNICORE XNJS UNICORE Atomic Services OGSA - * UNICORE XNJS UNICORE Atomic Services OGSA - * UNICORE WS - RF Service Registry CIS Information Service SMS SMS UNICORE WS - RF hosting env . IDB XACML entity UNICORE WS - RF hosting env . IDB XUUDB UVOS XACML entity UNICORE WS - RF hosting env . IDB XACML entity UNICORE WS - RF hosting env . IDB XUUDB UVOS XACML entity XUUDB UVOS
Target System Interface Target System Interface
XUUDB UVOS
Target System Interface Target System Interface
Local RMS (e.g. Torque, LL, LSF, etc.) Local RMS (e.g. Torque, LL, LSF, etc.)
External Storage
USpace USpace
DSS
Local RMS (e.g. Torque, LL, LSF, etc.)
Local RMS (e.g. Torque, LL, LSF, etc.) Local RMS (e.g. Torque, LL, LSF, etc.)Local RMS (e.g. Torque, LL, LSF, etc.)
External Storage
USpace USpace
DSS DSS
UniHadoop
UniHadoop: Design
: Design
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
• StorageManagement: Base Interface • List directory
• create directory
• change permissions etc...
• SMSBaseImpl: Basic implementation
• FixedStorageImpl: A storage service files for fixed path, such as „/work“
• SMSHadoopImpl: SMS Adaptation for SMSHadoopImpl: SMS Adaptation for Hadoop
• IStorageAdapter: Interface for accessing l l l hi hi l t t
low level hierarchical storage systems • HadoopStorageAdapter: HDFS specific
UniHadoop
UniHadoop: Usage Scenario 1
: Usage Scenario 1
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
• Default Shared Storage “H ” kfl
• “Huge” workflows
Target System 1 Target System 2 Target System 3 Target System N
UniHadoop
UniHadoop: Usage Scenario 2
: Usage Scenario 2
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
A shared storage for every Target System • A shared storage for every Target System
Target System 1 Target System 2 Target System 3 Target System N
Hadoop Storage
Hadoop Storage
Hadoop Storage
Demonstration
Demonstration
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
• Listing directories/files on HDFS • Creating files
Submitting jobs using data from HDFS • Submitting jobs using data from HDFS
Feature Matrix
Feature Matrix
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A
Features
Default SMS
Hadoop SMS
p
Unlimited Storage
Χ
Durability
Χ
Durability
Χ
Unlimited File Size
Χ
Disaster Recovery
Χ
U li it d N Of fil
Unlimited No. Of files
Χ
Trash RecoveryyΧ
Χ
Summary
Summary
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Extensions – Summary - Q&A
• Requirements
Architecture and Design • Architecture and Design • Usage scenarios
• Hadoop integration provide support for • MapReduce
• MapReduce
• Data analysis and intelligence using Pig, Hive and HBase • A number of available File- and Storagesystems
Q & A
Q & A
Contents – Motivation – Hadoop –UNICORE – UniHadoop - Summary - Q&A