ATLAS Tier 3

(1)

Introduction Exerience HDFS (Backup Slides)

Cerberus Hadoop

Hadoop@LaTech

ATLAS Tier 3

David Palma

DOSAR Louisiana Tech University

January 23, 2013

(2)

Outline

1 Introduction Cerberus Hadoop

2 Exerience Features Issues Conclusions

3 HDFS (Backup Slides) Architecture

Replication Accessibility

(3)

Cerberus, ATLAS Tier 3

Hadoop Capacity: 38.44 TB Hadoop Replication: 2 Hadoop effective: 19.22 TB

8 Hadoop DataNodes (Compute Nodes) 3-6 TB each

(4)

Hadoop

Apache Hadoop is a distributed computing framework It consists of two components:

A distributed file system (what we care about) A map/reduce implementation

Open source, Apache project, highly active Derived from Google’s proprietary implementation

Used at many large companies including Yahoo!, Facebook, and Twitter for analysis of large datasets (Big Data)

(5)

Features Issues Conclusions

Features

Manageability:

Can easily decommission nodes Hadoop fsck utility works very well Easy to setup, few services to manage Reliability:

Replication in HDFS works great

Multiple HDD failures with no data loss yet Scalability:

Yahoo uses it for > PB storage

CNs double as storage vs. separate storage cluster Block level decomposition, less network bottlenecks

(6)

Issues and Management

Decommission Nodes

vi /etc/hadoop/conf/hosts exclude && hadoop dfsadmin -refreshNodes

HDFS Health reports

Nightly cron: ”hadoop fsck /”

Lots of logs!

cron: rotate and bzip logs

HDFS-FUSE mount crashes with DQ2 Requires remount

Possible ROOT I/O performance issues (?)

(7)

Conclusions

Past

We’ve had a good experience with the replication features and manageability of HDFS.

We’ve had some issues with fuse-dfs+DQ2 Future

We’d like benchmark to disk and network performance

We’d like to improve our understanding of how HDFS’s random I/O permance affects ROOT-based I/O.

Next Steps: More monitoring, OSG SE: GridFTP/SRM

(8)

Architecture Replication Accessibility

Design Goals

Fault Tolerance: All files in HDFS are replicated at the block level across many nodes.

Write-Once-Read-Many: Simplifies data coherency issues and enables high throughput data access.

”Moving Computation is Cheaper than Moving Data”:

Computation requested is much more efficient if it is executed near the data it operates on. HDFS provides interfaces for applications to move themselves closer to where the data is located.

Big Data: HDFS is tuned to support large files. A typical file in HDFS is assumed to be gigabytes to terabytes in size.

(9)

HDFS Architecture

(10)

Node Types

NameNode (NN): Master

Acts as the arbitrator and repository for all HDFS metadata Manages the file system namespace and regulates access to files by clients.

Executes FS namespace operations like opening, closing, and renaming files and dirs.

Determines the mapping of blocks to DNs.

DataNodes (DN): Slaves

Manage storage attached to the nodes that they run on.

Serve read and write requests from the file system’s clients.

Perform block creation, deletion, and replication upon instruction from the NN.

Secondary NameNode (SNN):

Supports the NN merging namespace changes.

(11)

File System Internals

HDFS exposes a file system namespace and allows user data to be stored in files.

HDFS uses a traditional hierarchical file organization.

Files are split into blocks and these blocks are stored in a set of DNs.

If possible, each chunk will reside on a different DN.

Any change to the file system namespace or its properties is recorded by the NN.

An application can specify the number of replicas of a file that should be maintained.

The number of copies of a file is called the replication factor of

(12)

Replication

2

(13)

Block Replication

Each file is stored as a sequence of blocks (default: 64MB) block size and replication factor are configurable per file Replication factor can be changed (default: 3)

Files are write-once and have strictly one writer at any time The NN constantly tracks which blocks need to be replicated using:

Heartbeats: A periodic message from a DN Blockreports: A list of all blocks on a DN

(14)

Re-Replication

The primary objective of HDFS is tostore data reliably even in the presence of failures:

Each DN sends a heartbeat message to the NN periodically.

The NN marks DNs without recent heartbeats asdead and does not forward any new IO requests to them.

If DN death causes the replication factor of some blocks to fall below their specified value, the NN automatically initiates replication.

The necessity for re-replication may arise due to many reasons:

A DN may become unavailable A replica may become corrupted A hard disk on a DN may fail

The replication factor of a file may be increased.

(15)

FS Shell

HDFS allows user data to be organized in files and directories and provides a CLI:

$ hadoop fs -mkdir /foo

$ hadoop fs -rmr /foo

$ hadoop fs -ls /bar

$ hadoop fs -cat /bar/file.txt

(16)

FUSE clients

This project (contrib) allows HDFS to be mounted as a standard file system using the mount command.

Once mounted, the user can operate on an instance of hdfs using standard Unix utilities:

’ls’, ’cd’, ’cp’, ’mkdir’, ’find’, ’grep’, etc.

Use standard Posix libraries like open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc.

Writes are approximately 33% slower than the standard client Reads are 20-30% slower even with the read buffering.

HDFS does not support non-sequential writes (TFile can’t write directly to HDFS)

Random reads are okay.

(17)

References

Apache Hadoop Documentation.

http://hadoop.apache.org/docs/r0.20.2/index.html