• No results found

Distributed Filesystems

N/A
N/A
Protected

Academic year: 2021

Share "Distributed Filesystems"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Distributed Filesystems

Amir H. Payberah

Swedish Institute of Computer Science

[email protected]

April 8, 2014

(2)

What is Filesystem?

I Controls how data is storedin and retrievedfromdisk.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 2 / 32

(3)

What is Filesystem?

I Controls how data is storedin and retrievedfromdisk.

(4)

Distributed Filesystems

I When data outgrowsthe storage capacity of a singlemachine: par- tition it across a number of separatemachines.

I Distributed filesystems: manage the storage across a network of machines.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 3 / 32

(5)
(6)

HDFS

I HadoopDistributedFileSystem

I Appears as a singledisk

I Runs on top of a nativefilesystem, e.g., ext3

I Fault tolerant: can handle disk crashes, machine crashes, ...

I Based on Google’s filesystem GFS

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 5 / 32

(7)

HDFS is Good for ...

I Storing large files

Terabytes, Petabytes, etc...

100MB or more per file.

I Streaming data access

Data iswritten onceandread many times.

Optimized for batch reads rather thanrandomreads.

I Cheapcommodity hardware

No need for super-computers, use less reliable commodity hardware.

(8)

HDFS is Not Good for ...

I Low-latency reads

High-throughputrather thanlow latencyforsmall chunksof data.

HBaseaddresses this issue.

I Large amount of small files

Better for millions oflargefiles instead of billions ofsmallfiles.

I Multiple writers

Single writerper file.

Writes only at theend of file, no-support for arbitrary offset.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 7 / 32

(9)

HDFS Daemons (1/2)

I HDFS cluster is manager by three types of processes.

I Namenode

Manages thefilesystem, e.g., namespace, meta-data, and file blocks

Metadata is stored inmemory.

I Datanode

Storesandretrievesdata blocks

Reportsto Namenode

Runs onmanymachines

I Secondary Namenode

Only forcheckpointing.

Not a backupfor Namenode

(10)

HDFS Daemons (1/2)

I HDFS cluster is manager by three types of processes.

I Namenode

Manages thefilesystem, e.g., namespace, meta-data, and file blocks

Metadata is stored inmemory.

I Datanode

Storesandretrievesdata blocks

Reportsto Namenode

Runs onmanymachines

I Secondary Namenode

Only forcheckpointing.

Not a backupfor Namenode

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 8 / 32

(11)

HDFS Daemons (1/2)

I HDFS cluster is manager by three types of processes.

I Namenode

Manages thefilesystem, e.g., namespace, meta-data, and file blocks

Metadata is stored inmemory.

I Datanode

Storesandretrievesdata blocks

Reportsto Namenode

Runs onmanymachines

I Secondary Namenode

Only forcheckpointing.

Not a backupfor Namenode

(12)

HDFS Daemons (2/2)

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 9 / 32

(13)

Files and Blocks (1/3)

I Files are split intoblocks.

I Blocks

Singleunit of storage: a contiguous piece of information on a disk.

Transparentto user.

Managed byNamenode, stored byDatanode.

Blocks are traditionally either64MBor128MB: default is64MB.

(14)

Files and Blocks (2/3)

I Why is a block in HDFS so large?

To minimizethe cost ofseeks.

I Time to read a block = seek time+ transfer time

I Keeping the ratio transfertimeseektime small: we are reading data from the disk almost as fast as the physical limit imposed by the disk.

I Example: if seek time is 10msand the transfer rate is 100MB/s, to make the seek time 1%of the transfer time, we need to make the block size around 100MB.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 11 / 32

(15)

Files and Blocks (2/3)

I Why is a block in HDFS so large?

To minimizethe cost ofseeks.

I Time to read a block = seek time+ transfer time

I Keeping the ratio transfertimeseektime small: we are reading data from the disk almost as fast as the physical limit imposed by the disk.

I Example: if seek time is 10msand the transfer rate is100MB/s, to make the seek time 1%of the transfer time, we need to make the block size around 100MB.

(16)

Files and Blocks (3/3)

I Same block isreplicated on multiple machines: default is3

Replica placements arerack aware.

1streplica on thelocal rack.

2ndreplica on thelocal rack but different machine.

3rdreplica on the different rack.

I Namenodedetermines replica placement.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 12 / 32

(17)

HDFS Client

I Client interacts with Namenode

To updatethe Namenode namespace.

To retrieve block locationsfor writing and reading.

I Client interacts directly withDatanode

To readandwritedata.

I Namenode doesnotdirectly write or read data.

(18)

HDFS Write

I 1. Create a new filein the Namenode’s Namespace; calculate block topology.

I 2, 3, 4. Stream datato the first, second and third node.

I 5, 6, 7. Success/failureacknowledgment.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 14 / 32

(19)

HDFS Read

I 1. Retrieveblock locations.

I 2, 3. Read blocksto re-assemble the file.

(20)

Namenode Memory Concerns

I For fast accessNamenode keeps all block metadata in-memory.

Will work well for clusters of100 machines.

I Changingblock sizewill affect how muchspacea cluster can host.

64MBto 128MBwill reduce the number of blocks and increase the space that Namenode can support.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 16 / 32

(21)

HDFS Federation

I Hadoop 2+

I Each Namenode will host part of the blocks.

I A Block Poolis a set of blocks that belong to a single namespace.

I Support for 1000+ machineclusters.

(22)

Namenode Fault-Tolerance (1/2)

I Namenode is asingle point of failure.

I If Namenode crashes then cluster is down.

I Secondary Namenode periodically merges the namespaceimageand log and a persistentrecord of it written to disk (checkpointing).

I But, the state of thesecondary Namenodelagsthat of theprimary: does notprovide high-availability of the filesystem

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 18 / 32

(23)

Namenode Fault-Tolerance (1/2)

I Namenode is asingle point of failure.

I If Namenode crashes then cluster is down.

I Secondary Namenode periodically merges the namespaceimageand log and a persistentrecord of it written to disk (checkpointing).

I But, the state of thesecondary Namenodelagsthat of theprimary:

doesnotprovide high-availability of the filesystem

(24)

Namenode Fault-Tolerance (2/2)

I High availability Namenode.

Hadoop 2+

Active standbyis always running and takes over in case mainNamen- ode fails.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 19 / 32

(25)

Summary

I Good for large files.

I Streaming access rather than random access.

I Daemons: Namenode, Secondary Namenode, and Datanode

(26)

HDFS Installation and Shell

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 21 / 32

(27)

HDFS Installation

I Three options

Local (Standalone) Mode

Pseudo-Distributed Mode

Fully-Distributed Mode

(28)

Installation - Local

I Default configuration after the download.

I Executes as a single Java process.

I Works directly with localfilesystem.

I Useful for debugging.

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 23 / 32

(29)

Installation - Pseudo-Distributed (1/6)

I Still runs on a single node.

I Each daemon runs in itsown Java process.

Namenode

Secondary Namenode

Datanode

I Configuration files:

hadoop-env.sh

core-site.xml

hdfs-site.xml

(30)

Installation - Pseudo-Distributed (2/6)

I Specify environment variables in hadoop-env.sh

export JAVA_HOME=/opt/jdk1.7.0_51

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 25 / 32

(31)

Installation - Pseudo-Distributed (3/6)

I Specify location of Namenodein core-site.sh

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:8020</value>

<description>NameNode URI</description>

</property>

(32)

Installation - Pseudo-Distributed (4/6)

I Configurations of Namenode in hdfs-site.sh

I Pathon the local filesystem where theNamenode storesthe names- pace and transaction logs persistently.

<property>

<name>dfs.namenode.name.dir</name>

<value>/opt/hadoop-2.2.0/hdfs/namenode</value>

<description>description...</description>

</property>

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 27 / 32

(33)

Installation - Pseudo-Distributed (5/6)

I Configurations of Secondary Namenode in hdfs-site.sh

I Path on the local filesystem where theSecondary Namenode stores the temporary images to merge.

<property>

<name>dfs.namenode.checkpoint.dir</name>

<value>/opt/hadoop-2.2.0/hdfs/secondary</value>

<description>description...</description>

</property>

(34)

Installation - Pseudo-Distributed (6/6)

I Configurations of Datanode in hdfs-site.sh

I Comma separated list ofpathson the local filesystem of aDatanode where it should store its blocks.

<property>

<name>dfs.datanode.data.dir</name>

<value>/opt/hadoop-2.2.0/hdfs/datanode</value>

<description>description...</description>

</property>

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 29 / 32

(35)

Start HDFS and Test

I Format the Namenode directory (do this only once, thefirst time).

hdfs namenode -format

I Startthe Namenode, Secondary namenode and Datanodedaemons.

hadoop-daemon.sh start namenode

hadoop-daemon.sh start secondarynamenode hadoop-daemon.sh start datanode

jps

I Verify the deamons are running:

Namenode: http://localhost:50070

Secondary Namenode: http://localhost:50090

Datanode: http://localhost:50075

(36)

Start HDFS and Test

I Format the Namenode directory (do this only once, thefirst time).

hdfs namenode -format

I Startthe Namenode, Secondary namenode and Datanodedaemons.

hadoop-daemon.sh start namenode

hadoop-daemon.sh start secondarynamenode hadoop-daemon.sh start datanode

jps

I Verify the deamons are running:

Namenode: http://localhost:50070

Secondary Namenode: http://localhost:50090

Datanode: http://localhost:50075

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 30 / 32

(37)

Start HDFS and Test

I Format the Namenode directory (do this only once, thefirst time).

hdfs namenode -format

I Startthe Namenode, Secondary namenode and Datanodedaemons.

hadoop-daemon.sh start namenode

hadoop-daemon.sh start secondarynamenode hadoop-daemon.sh start datanode

jps

I Verify the deamons are running:

Namenode: http://localhost:50070

Secondary Namenode: http://localhost:50090

Datanode: http://localhost:50075

(38)

HDFS Shell

hdfs dfs -<command> -<option> <path>

hdfs dfs -ls /

hdfs dfs -ls file:///home/big hdfs dfs -ls hdfs://localhost/ hdfs dfs -cat /dir/file.txt

hdfs dfs -cp /dir/file1 /otherDir/file2 hdfs dfs -mv /dir/file1 /dir2/file2 hdfs dfs -mkdir /newDir

hdfs dfs -put file.txt /dir/file.txt # can also use copyFromLocal hdfs dfs -get /dir/file.txt file.txt # can also use copyToLocal hdfs dfs -rm /dir/fileToDelete

hdfs dfs -help

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 31 / 32

(39)

HDFS Shell

hdfs dfs -<command> -<option> <path>

hdfs dfs -ls /

hdfs dfs -ls file:///home/big hdfs dfs -ls hdfs://localhost/

hdfs dfs -cat /dir/file.txt

hdfs dfs -cp /dir/file1 /otherDir/file2 hdfs dfs -mv /dir/file1 /dir2/file2 hdfs dfs -mkdir /newDir

hdfs dfs -put file.txt /dir/file.txt # can also use copyFromLocal hdfs dfs -get /dir/file.txt file.txt # can also use copyToLocal hdfs dfs -rm /dir/fileToDelete

hdfs dfs -help

(40)

Questions?

Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 32 / 32

References

Related documents

Based on the example illustrated in Fig. 1, we identified some limitations of these conventional definitions, which do not integrate the impact of input port buffer size in routers.

In the following section, information on the VDR variants found in the populations of South Africa was collated, with the aim of determining the function of such polymorphisms

The study serve as a feedback to the construction stakeholders in Nigeria and contains useable information for optimizing the existing due process mechanism to enhance not

understanding how older workers learn in the workplace, and professional learning grounded within socio cultural literature of workplace learning, there is a valid claim for

To prevail, a FRSA complainant must establish by a preponderance of the evidence that: (1) he engaged in a protected activity, as statutorily defined; (2) he suffered an

Objectives: The main objective of the study is to measure the impact of microvascular complications and to assess the effect of patient counselling in im- proving

This TXT to PDF converter can convert TXT Plain Text files to PDF Portable Document Format ebook How To Use too a TXT file Click either Convert to.. It many good OCR and allows you

Reservation Software, Hotel Reception Software (Front Office), Call Accounting, Hotel Point of Sales (Restaurant, Bar, Room Service, House Keeping or any other outlet),