Introduction Exerience HDFS (Backup Slides)
Cerberus Hadoop
Hadoop@LaTech
ATLAS Tier 3
David Palma
DOSAR Louisiana Tech University
January 23, 2013
Introduction Exerience HDFS (Backup Slides)
Cerberus Hadoop
Outline
1 Introduction Cerberus Hadoop
2 Exerience Features Issues Conclusions
3 HDFS (Backup Slides) Architecture
Replication Accessibility
Introduction Exerience HDFS (Backup Slides)
Cerberus Hadoop
Cerberus, ATLAS Tier 3
Hadoop Capacity: 38.44 TB Hadoop Replication: 2 Hadoop effective: 19.22 TB
8 Hadoop DataNodes (Compute Nodes) 3-6 TB each
Introduction Exerience HDFS (Backup Slides)
Cerberus Hadoop
Hadoop
Apache Hadoop is a distributed computing framework It consists of two components:
A distributed file system (what we care about) A map/reduce implementation
Open source, Apache project, highly active Derived from Google’s proprietary implementation
Used at many large companies including Yahoo!, Facebook, and Twitter for analysis of large datasets (Big Data)
Introduction Exerience HDFS (Backup Slides)
Features Issues Conclusions
Features
Manageability:
Can easily decommission nodes Hadoop fsck utility works very well Easy to setup, few services to manage Reliability:
Replication in HDFS works great
Multiple HDD failures with no data loss yet Scalability:
Yahoo uses it for > PB storage
CNs double as storage vs. separate storage cluster Block level decomposition, less network bottlenecks
Introduction Exerience HDFS (Backup Slides)
Features Issues Conclusions
Issues and Management
Decommission Nodes
vi /etc/hadoop/conf/hosts exclude && hadoop dfsadmin -refreshNodes
HDFS Health reports
Nightly cron: ”hadoop fsck /”
Lots of logs!
cron: rotate and bzip logs
HDFS-FUSE mount crashes with DQ2 Requires remount
Possible ROOT I/O performance issues (?)
Introduction Exerience HDFS (Backup Slides)
Features Issues Conclusions
Conclusions
Past
We’ve had a good experience with the replication features and manageability of HDFS.
We’ve had some issues with fuse-dfs+DQ2 Future
We’d like benchmark to disk and network performance
We’d like to improve our understanding of how HDFS’s random I/O permance affects ROOT-based I/O.
Next Steps: More monitoring, OSG SE: GridFTP/SRM
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
Design Goals
Fault Tolerance: All files in HDFS are replicated at the block level across many nodes.
Write-Once-Read-Many: Simplifies data coherency issues and enables high throughput data access.
”Moving Computation is Cheaper than Moving Data”:
Computation requested is much more efficient if it is executed near the data it operates on. HDFS provides interfaces for applications to move themselves closer to where the data is located.
Big Data: HDFS is tuned to support large files. A typical file in HDFS is assumed to be gigabytes to terabytes in size.
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
HDFS Architecture
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
Node Types
NameNode (NN): Master
Acts as the arbitrator and repository for all HDFS metadata Manages the file system namespace and regulates access to files by clients.
Executes FS namespace operations like opening, closing, and renaming files and dirs.
Determines the mapping of blocks to DNs.
DataNodes (DN): Slaves
Manage storage attached to the nodes that they run on.
Serve read and write requests from the file system’s clients.
Perform block creation, deletion, and replication upon instruction from the NN.
Secondary NameNode (SNN):
Supports the NN merging namespace changes.
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
File System Internals
HDFS exposes a file system namespace and allows user data to be stored in files.
HDFS uses a traditional hierarchical file organization.
Files are split into blocks and these blocks are stored in a set of DNs.
If possible, each chunk will reside on a different DN.
Any change to the file system namespace or its properties is recorded by the NN.
An application can specify the number of replicas of a file that should be maintained.
The number of copies of a file is called the replication factor of
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
Replication
2
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
Block Replication
Each file is stored as a sequence of blocks (default: 64MB) block size and replication factor are configurable per file Replication factor can be changed (default: 3)
Files are write-once and have strictly one writer at any time The NN constantly tracks which blocks need to be replicated using:
Heartbeats: A periodic message from a DN Blockreports: A list of all blocks on a DN
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
Re-Replication
The primary objective of HDFS is tostore data reliably even in the presence of failures:
Each DN sends a heartbeat message to the NN periodically.
The NN marks DNs without recent heartbeats asdead and does not forward any new IO requests to them.
If DN death causes the replication factor of some blocks to fall below their specified value, the NN automatically initiates replication.
The necessity for re-replication may arise due to many reasons:
A DN may become unavailable A replica may become corrupted A hard disk on a DN may fail
The replication factor of a file may be increased.
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
FS Shell
HDFS allows user data to be organized in files and directories and provides a CLI:
$ hadoop fs -mkdir /foo
$ hadoop fs -rmr /foo
$ hadoop fs -ls /bar
$ hadoop fs -cat /bar/file.txt
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
FUSE clients
This project (contrib) allows HDFS to be mounted as a standard file system using the mount command.
Once mounted, the user can operate on an instance of hdfs using standard Unix utilities:
’ls’, ’cd’, ’cp’, ’mkdir’, ’find’, ’grep’, etc.
Use standard Posix libraries like open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash, etc.
Writes are approximately 33% slower than the standard client Reads are 20-30% slower even with the read buffering.
HDFS does not support non-sequential writes (TFile can’t write directly to HDFS)
Random reads are okay.
Introduction Exerience HDFS (Backup Slides)
Architecture Replication Accessibility
References
Apache Hadoop Documentation.
http://hadoop.apache.org/docs/r0.20.2/index.html