• No results found

Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon

N/A
N/A
Protected

Academic year: 2021

Share "Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

Implementing the Hadoop Distributed

File System Protocol on OneFS

Jeff Hughes EMC Isilon

(2)

Outline

 Hadoop Overview

OneFS Overview

 MapReduce + OneFS

Details of isi_hdfs_d

(3)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

(4)

Apache Hadoop Project

 Two main components

 MapReduce

 Distributed File System (DFS)

The Apache Hadoop software library is a framework that allows for the distributed

processing of large data sets across clusters of computers using a simple programming model.

(5)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

Hadoop: MapReduce

MapReduce

Distributed computation framework

 Optimized for batch processing

Typical I/O profile:

5

DFS

(6)
(7)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

Hadoop: HDFS Semantics

Metadata/data server cluster architecture

Cluster coherent namespace and data

Write-once-read-many access

Single writer only

 Can append to existing files

Data mirrored 3x for resiliency Client exposed to data topology

Block locations as part of file metadata

7

http://www.snia.org/sites/default/files2/sdc_archives/2010_presentations/wednesday/DhrubaBorth akur-Hadoop_File_Systems.pdf

(8)

Hadoop: Why HDFS?

Portability – All user space and OS independent

Purpose-built – Primary workflow is MapReduce

 Limited set of operations to implement

Single software package

 Fluid client/server protocol development

Exposure of data topology

(9)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

(10)

OneFS: Architecture

Isilon IQ Storage

Layer Communication Infiniband Intracluster

Servers

Client/Application Layer Ethernet Layer

Servers

(11)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

OneFS: OS

Built from the ground up on FreeBSD

File system is a loadable kernel module

with VFS interface

Supports POSIX syscalls locally

 Protocol servers access /ifs paths

FS Built for mixed namespace access

 Supports SMB, NFS, HTTP, etc.

(12)

OneFS: Semantics

Symmetric cluster architecture

Metadata distributed across all nodes

 Tightly coupled group semantics

Globally coherent file system access

 Distributed lock manager

Two-phase commit for all write operations

(13)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

(14)

MapReduce + OneFS: Architecture

OneFS Node NameNode DataNode OneFS Node NameNode DataNode OneFS Node NameNode DataNode OneFS Node NameNode DataNode Hadoop Node DFSClient 1) Request(“/file”) 2) Response

(block locations) 3) GetBlock(block)

OneFS runs a daemon that speaks NameNode and DataNode natively

(15)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

MapReduce + OneFS: Benefits

Easier integration to existing workflows

First class multi-protocol access

 Reduce ETL stages

Increased disk efficiency

 HDFS: 30% usable, OneFS: 80% usable

Reduced data center footprint

More data management options

Snapshots, site replication, etc.

(16)

MapReduce + OneFS: Challenges

Typical data path locality changes

 MapReduce ➝ HDFS acts like DAS

 MapReduce ➝ OneFS goes over the network

Client/server compatibility and maintenance

 OneFS and MapReduce clusters run different

software versions

(17)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

MapReduce + OneFS: Mitigations

 1GbE < SATA controller < 10GbE

Hadoop designed for 1GbE

 10GbE prices dropping

Denser storage == less nodes/networking

 “Rack” locality limits cross-switch contention

17 Rack A Rack B Rack C

(18)

MapReduce + OneFS: Performance

DFS

Read Map Task Map Output Shuffle Reduce Task Write DFS

Typically ~100Mbit per task from HDFS

I/Os against temp vary considerable per job

More variable, still ~100Mbit per task to HDFS

 Performance bottleneck likely to be temp space

Terasort example: 75% of I/Os against temp

 Latency not much impact over HDFS large

(19)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

(20)

HDFS Protocols

Two TCP based protocols

NameNode – metadata operations

 DataNode – data transfers

About 26 NameNode RPCs

 Mostly use fully qualified paths

POSIX-like file attrs (modebits, user/group)

Only 2 DataNode client operations

(21)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

NameNode Request Example

Example getFileInfo(“/testfile”) request (think stat):

21 “/testfile”

Parameter type (string) Method name

(22)

NameNode Response Example

(23)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

isi_hdfs_d

 Multi-threaded daemon runs on all nodes

Services both NN and DN protocols

 Translates RPCs to POSIX system calls

Stateless, underlying FS handles coherency

23 OneFS Node isi_hdfs_d Thread Request VFS

OneFS

Syscall Response

(24)

Example NameNode RPCs

Most NameNode RPCs are straightforward

setPermission() → chmod(…)

 setTimes() → utimes(…)

create() → open(…, O_CREAT, …)

Other RPCs need some creative interpretation

recoverLease()/renewLease()

 abandonBlock()

(25)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

HDFS Data Path

 NN RPC: getBlockLocations(file)/addBlock(file)

Returns list of LocatedBlocks

25 LocatedBlock long offset Block long blkid long genStamp long numBytes DatanodeInfo[] ...  DFSClient connects to DN

Chooses which DatanodeInfo

based on locality

Only Block structure passed to

(26)

LocatedBlocks Translation

LocatedBlock long offset Block long blkid long genStamp long numBytes DatanodeInfo[] ...

Logical byte offset into the file Opaque to client, used by DN

Inode number Size of extent

Absolute byte offset

<IP:port> and rack info for

Specific to OneFS and isi_hdfs_d

(27)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

Read Path Example

27 getBlockLocations()

LocatedBlocks

DN_OP_READ(Block)

Data stream

(28)

NameNode Connection Routing

NameNode is configured as single URL

Easy configuration:

hdfs://log-server.isilon.com:8020/

DNS round-robin to distribute across nodes

 Metadata IOPs get spread out

OneFS maintains cross-node consistency

IP Failover plus client retries for resiliency

(29)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

DataNode References

DataNode references returned by NameNode

All OneFS DataNodes can access same data

Each LocatedBlocks for reads has 3 DN refs

Round-robin across available nodes

 Multiple refs lets client try other nodes before

coming back to the NameNode again

Write path only 1 reference per logical block

 No need for client to replicate writes

(30)

Few Quirks

User/group identities are strings

OneFS natively stores UIDs or SIDs only

 Requires name resolution on access

Locking!

 HDFS uses “leases” to restrict to single writer

Implemented but no cross-protocol contention

Hadoop apps don’t expect files to move

(31)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

It Works!

31

You see this across NFS:

And the same directory on HDFS:

(32)

Conclusions

 HDFS protocol can map to POSIX pretty easily

Not all traditional shared storage is bad for

Hadoop workflows

Locality features worth preserving even without

node locality as a possibility

(33)

2012 Storage Developer Conference. © EMC Corporation. All Rights Reserved.

33

Questions?

[email protected]

References

Related documents

When sizing an Isilon cluster for a new installation, Vantage server performance is benchmarked using local storage in order to determine bandwidth requirements for a single

We focus on the seasonal morphology of the trough occurrence and investigate the trough latitude, width and the horizontal gradients at the edges, at different mag- netic local

We examined the relationship between self-report likelihood ratings for risky choice in a monetary gamble task and actual choice, and tested how the relationship is affected by

In order gain protection under the whistleblower provisions of Sarbanes-Oxley, an employee must act upon a “reasonable belief” that a specified federal securities or fraud law was

The Isilon OneFS 6.5 operating system supports three series of scale-out NAS platform nodes — S, X, and NL — as well as data management, data protection, data replication,

You know I've got a good mind to pick the first girl I see on the street, just go up to the first dame who passes and say--You.. PEGGY (At first looking

OPERATING SYSTEM EMC Isilon OneFS ® distributed file system: creates a cluster with a single file system and single global namespace; fully journaled, fully distributed,

OPERATING SYSTEM EMC Isilon OneFS distributed file system: creates a cluster with a single file system and single global namespace; fully journaled, fully distributed,