PERFORMANCE ANALYSIS OF A DISTRIBUTED FILE SYSTEM

(1)

P

ERFORMANCE

A

NALYSIS OF

A

D

ISTRIBUTED

F

ILE

S

YSTEM

SUBMITTED

BY

DIBYENDU KARMAKAR

EXAMINATION ROLL NUMBER: M4SWE13-07

REGISTRATION NUMBER: 117079 of 2011-2012

A THESIS SUBMITTED TO

THE FACULTY OF ENGINEERING & TECHNOLOGY OF JADAVPUR UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF ENGINEERING

IN

SOFTWARE ENGINEERING

UNDER THE SUPERVISION

OF

MR. UTPAL KUMAR RAY

VISITING FACULTY

DEPARTMENT OF INFORMATION TECHNOLOGY

JADAVPUR UNIVERSITY

(2)

_______________________

(M

R

.

U

TPAL

K

UMAR

R

AY

)

V

ISITING

F

ACULTY

DEPARTMENT OF INFORMATION TECHNOLOGY

JADAVPUR UNIVERSITY

C

OUNTERSIGNED BY

:

_______________________

(HEAD OF THE DEPARTMENT)

INFORMATION TECHNOLOGY

J

ADAVPUR

U

NIVERSITY

D

EPARTMENT OF

I

NFORMATION

T

ECHNOLOGY

F

ACULTY OF

E

NGINEERING

&

T

ECHNOLOGY

J

ADAVPUR

U

NIVERSITY

C

ERTIFICATE OF

S

UBMISSION

I hereby recommend that the thesis, entitled “Performance Analysis of a

Distributed File System”, prepared by Dibyendu Karmakar (Registration. No.

117079 of 2011-2012) under my supervision, be accepted in partial fulfillment of

the requirement for the degree of Master of Engineering in Software Engineering

from the Department of Information Technology under Jadavpur University.

(3)

_________________________

(MR.UTPAL KUMAR RAY)

VISITING FACULTY

DEPARTMENT OF INFORMATION TECHNOLOGY JADAVPUR UNIVERSITY

_______________________

(DR.SASWAT CHAKRABARTI)

PROFESSOR AND HEAD

GSSANYAL SCHOOL OF TELECOMMUNICATION IIT-KHARAGPUR

D

EPARTMENT OF

I

NFORMATION

T

ECHNOLOGY

FACULTY

OF

ENGINEERING

&

TECHNOLOGY

JADAVPUR UNIVERSITY

C

ERTIFICATE OF

A

PPROVAL

The thesis at instance is hereby approved as a creditable study of an Engineering subject carried out and presented in a manner satisfactory to warrant its acceptance as a prerequisite to the degree for which it has been submitted. It is understood that by this approval the undersigned does not necessarily endorse or approve any statement made, opinion expressed or conclusion drawn therein, but approve this thesis for the purpose for which it is submitted.

(4)

D

ECLARATION OF

O

RIGINALITY AND

C

OMPLIANCE OF

A

CADEMIC

E

THICS

I hereby declare that this thesis contains literature survey and original research work by me, as a part of my Master of Engineering in Software Engineering studies.

All information in this document have been obtained and presented in accordance with academic rules and ethical conduct.

I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work.

______________________

(S

IGNATURE WITH

D

ATE

)

N

AME

:

D

IBYENDU

K

ARMAKAR

E

XAM

R

OLL

N

O

:

M4SWE13-07

(5)

THE SUCCESS AND FINAL OUTCOME OF THIS PROJECT REQUIRED A LOT OF

GUIDANCE AND ASSISTANCE FROM MANY PEOPLE AND

I AM EXTREMELY

FORTUNATE TO HAVE GOT THIS ALL ALONG THE COMPLETION OF MY PROJECT WORK.

WHATEVER

I HAVE DONE IS ONLY DUE TO SUCH GUIDANCE AND

ASSISTANCE AND I WOULD NOT FORGET TO THANK THEM.

I OWE MY PROFOUND GRATITUDE TO MY PROJECT GUIDE

PROF.

UTPAL

KUMAR RAY, WHO TOOK KEEN INTEREST ON MY PROJECT WORK AND GUIDED ME

ALL ALONG, TILL THE COMPLETION OF MY PROJECT WORK BY PROVIDING ALL THE NECESSARY INFORMATION.

I WOULD NOT FORGET TO REMEMBER THE HADOOP COMMUNITY MEMBERS

FOR THEIR TIMELY SUPPORT AND ASSISTANCE. I WOULD ALSO LIKE TO THANK ALL OF MY CLASSMATES FOR THE CONSTANT SUPPORT AND HELP THEY PROVIDED ALL THE TIME.

R

EGARDS

,

_______________________

(D

IBYENDU

K

ARMAKAR

)

N

O

:

117079

OF

2011-12

A

CKNOWLEDGEMENT

LOCATION:

DATE:

(6)

D

EDICATED TO

(7)

A

BSTRACT

The need for a large storage of data has gradually been increased in recent years. Individual companies are storing petabytes of data. The need for reduction in access time of data is also evident. As the size and value of the stored data increases, the importance of fault tolerant system, reliability increases. Distributed file system has gradually become popular in this regard.

This thesis represents an approach to improve the performance of a distributed file system by analyzing and tuning few configuration parameters of a distributed file system. These parameters follow a particular curve or graph and are tunable for better performance of a distributed file system.Hadoop distributed file system (HDFS) has been taken as a standard distributed file system highlighting the basic working principles of hadoop along with the setup configuration. The experimental results and a suitable graph for each performance tuning parameters have been shown explaining how the distributed file system behaves with respect to these parameters. A useful conclusion has been discussed specifying which parameter plays a key role in increasing the performance of the distributed file systems along with tunable values for those parameters.

(8)

Page No.

Chapter 1 : Introduction………

1 1.1. Motivation………

2 1.2. Focus………

2 1.3. Organization………

2 Chapter 2 : Introduction to Distributed File System……….

4 2.1. What is a File System?...

5 2.2. Definition of Distributed File System……….

5 2.3. Why Distributed File System………..

5 Chapter 3 : Hadoop Concepts………..

7 3.1. Architecture………

8

3.1.1. Cluster………..

8

3.1.2. Namenode………

8

3.1.3. Datanode………..

9

3.1.4. HDFS Client………

10

3.1.5. Image and Journal………...

11

3.1.6. Checkpoint Node……….

12

3.1.7. Backup Node………

12

3.1.8. File System Snapshots………

13 3.2. IO Operations and Replica Management………

14

3.2.1. File Read and Write……….

14

3.2.2. Heartbeat and Block Report……….

15

3.2.3. Staging……….

15

3.2.4. Replication Pipelining……….

16

3.2.5. Data Block Placement……….

17

3.2.6. Replica Management………..

17

3.2.7. Balancer………..

18

3.2.8. Block Scanner……….

18

3.2.9. Decommissioning………...

19 3.2.10. Inter Cluster Data Copy………..

19 Chapter 4 : Setting up the Hadoop Environment………..

20 4.1. Hadoop Configuration………..

21

(9)

4.3. System configuration……….

22 Chapter 5 : Hadoop Performance Tuning Parameters……….

24 5.1. Cluster-Level Tunable Parameters……….

25

5.1.1. Server-Level Tunable Parameters………

25

5.1.2. HDFS Tunable Parameters………..

26 Chapter 6 : Performance Results & Analysis……….

30 6.1. Scenario 1: Effect of Multiple Clients………

31

6.1.1. Performance Analysis………..

31 6.2. Scenario 2: Effect of Replication Factor (Replication Factor < No. of

Available Datanodes)……….

32

6.2.1. Performance Analysis……….

33 6.3. Scenario 3: Effect of Replication Factor (Replication Factor > No. of

Available Datanodes)……….

34

6.3.1. Performance Analysis……….

35 6.4. Scenario 4: Effect of Block Size (dfs.block.size)………..

37

6.4.1. Performance Results………

37

6.4.2. Performance Analysis………..

38 6.5. Scenario 5: Effect of IO Buffer Size (io.file.buffer.size)………...

39

6.5.1. Performance Results………

39

6.5.2. Performance Analysis………..

40 6.6. Scenario 6: Effect of dfs.access.time.precision……….

41

6.6.1. Performance Results………

41

6.6.2. Performance Analysis……….

42 6.7. Scenario 7: Effect of dfs.replication.interval……….

43

6.7.1. Performance Results………

43

6.7.2. Performance Analysis……….

44 6.8. Scenario 8: Effect of Heartbeat and Blockreport Intervals……….

45

6.8.1. Performance Results………

45

6.8.2. Performance Analysis……….

46 6.9. Scenario 9: Effect of Server and Block Level Threads…………..

48

6.9.1. Performance Results………

48

6.9.2. Performance Analysis……….

49 Chapter 7 : Conclusion……….

52 7.1. Conclusion……….

53 7.2. Further Work………..

54 References………..

55

(10)

APPENDIX A: Hadoop Installation……….

58 APPENDIX B: Hadoop Shell Commands………

70 APPENDIX C: Dealing with Installation Errors………..

78

(11)

C

HAPTER

1

(12)

2 | P a g e

1.1 M

OTIVATION

In recent years the amount of data stored worldwide has been increased by a factor of nine. Individual companies are often storing petabytes of data containing their business information for getting some useful business strategy that lead to their continued growth and success. However, the amount of data needed to store is often too large to store and analyse the data in traditional relational databases or the time required to analyse the data is too big. Further, the useful business information that can be gained from these large amounts of data may be valuable, but this useful analysed business information is effectively inaccessible if the IT costs to reach them are yet greater.

Distributed software platform has grown to be popular for managing and storing large amount of data in a cost-effective way satisfying the above needs. Distributed file system gives the developers the opportunity to focus on the high-level algorithms by providing high reliability, instant backup facility, fault tolerance etc. It is designed to run on a large cluster scaling to hundreds or thousands of nodes. Hence the need for a distributed file system to overcome performance issue is evident.

1.2. F

OCUS

The main focus of this thesis is to improve the performance of distributed file systems. Now, the performance of a file system can be increased by having less computational time to perform the required operations. An obvious approach to have less computational time is to upgrade the processors of each individual machine which is evidently not a cost effective approach. This thesis represents an approach to gain better performance by analyzing and tuning few configuration parameters of a distributed file system. These parameters follow a particular curve or graph and are tunable for better performance of a distributed file system. Hadoop Distributed File System (HDFS) has been taken as a standard distributed file system in this work. So, all experiments are performed in HDFS.

1.3. O

It is followed by

R

EFERENCES used to fulfill the project.

A

PPENDICES provide information about Hadoop Installation, Dealing with errors while installing hadoop and List of all Hadoop Commands.

(14)

C

HAPTER

2 I

NTRODUCTION TO

D

ISTRIBUTED

F

ILE

S

YSTEM

Chapter Gist:

This chapter defines file system and distributed file system and

thereafter describes the need for a distributed file system.

(15)

5 | P a g e

1.1. W

HAT IS

F

ILE

S

YSTEM

?

A file system[1] is a subsystem of an operating system that performs file management activities such as organization, storing, retrieval, naming, sharing and protection of files. File systems are used on data storage devices, such as hard disk drives, floppy disks, optical discs, or flash memory storage devices, to maintain the physical locations of the computer files and directories.

Example: -

I. FAT (File Allocation Table) File System

II. NTFS (New Technology File System) used in Microsoft's Windows 7, Windows Vista, Windows XP and Windows 2000.

1.2. D

EFINITION OF

D

ISTRIBUTED

F

ILE

S

YSTEM

A distributed file system[1][2][9] is a client/server based application that allows clients to access and process data stored on the server as if it were on their own computer.

A distributed file system organizes file and directory services of individual servers into a global directory in such a way that remote data access is not location-specific but is identical from any client. All files are accessible to all users of the global file system and organization is hierarchical and directory-based.

As a whole, a distributed file system is any file system that allows access to files from multiple hosts/clients via a computer network.

Example:-

I. Hadoop Distributed File System

II. Mobile Agent Based Distributed File System III. Parallel Virtual File System

IV. Fraunhofer File System etc.

1.3. W

HY

D

ISTRIBUTED

F

ILE

S

YSTEM

Distributed File System has been introduced due to several advantages[2] over centralized file system such as:-



U

SER

M

OBILITY

Flexibility to work on different nodes at different times without the necessity of physically relocating the Secondary Storage devices

(16)

6 | P a g e



R

EMOTE

I

NFORMATION

S

HARING

Transparent access of files by processes of any node (host/client) irrespective of file’s location



A

VAILABILITY

File availability for use in the event of temporary failure of one or more nodes using replicas (copies of original files)



P

ERFORMANCE

High performance can be achieved by executing sub-processes of a particular process in parallel on multiple remote nodes.

(17)

C

HAPTER

3 H

ADOOP

C

ONCEPTS

Chapter Gist:

This chapter highlights the basic concepts of Hadoop Distributed

File System (HDFS) focusing on its architecture and operational aspects.

(18)

8 | P a g e

Hadoop is an Apache project. All components are available via the Apache open source license. Yahoo has developed and contributed to 80% of the core of Hadoop.

Hadoop provides a distributed file system and a framework for the analysis and transformation of very large data sets using the Map-Reduce paradigm. An important characteristic of Hadoop is the partitioning of data and computation across many (thousands) of hosts, and executing application computations in parallel close to their data.

HDFS is the file system component of Hadoop. While the interface to HDFS is patterned after the UNIX file system, faithfulness to standards was sacrificed in favor of improved performance for the applications at hand. HDFS stores file system metadata and application data separately. As in other distributed file systems, like PVFS, Lustre and GFS, HDFS stores metadata on a dedicated server, called the NameNode. Application data are stored on other servers called DataNodes. All servers are fully connected and communicate with each other using TCP-based protocols.

3.1. A

RCHITECTURE

3.1.1. C

LUSTER

Hadoop Distributed File System is composed of two types of node – DataNode[3][4][5][6] and NameNode[3][4][5]. All nodes in this distributed file system are grouped into clusters. Each Cluster contains one NameNode and multiple DataNode.

3.1.2. N

AME

N

ODE

The HDFS namespace is a hierarchy of files and directories. Files and directories are represented on the NameNode by inodes, which record attributes like permissions, modification and access times, namespace and disk space quotas. The file content is split into large blocks (typically 128 megabytes, but user selectable file-by-file) and each block of the file is independently replicated at multiple DataNodes (typically three, but user selectable file-by-file). The NameNode maintains the namespace tree and the mapping of file blocks to DataNodes.

An HDFS client wanting to read a file first contacts the NameNode for the locations of data blocks comprising the file and then reads block contents from the DataNode closest to the client. When writing data, the client requests the NameNode to nominate a suite of three DataNodes to host the block replicas. The client then writes data to the DataNodes in a pipeline fashion. The current design has a single NameNode for each cluster. The cluster can have thousands of DataNodes and tens of thousands of HDFS clients per cluster, as each DataNode may execute multiple application tasks concurrently.

HDFS keeps the entire namespace in RAM. The inode data and the list of blocks belonging to each file comprise the meta-data of the name system called the image. The persistent record of

(19)

9 | P a g e

the image stored in the local host’s native files system is called a checkpoint. The NameNode also stores the modification log of the image called the journal in the local host’s native file system. For improved durability, redundant copies of the checkpoint and journal can be made at other servers. During restarts the NameNode restores the namespace by reading the namespace and replaying the journal. The locations of block replicas may change over time and are not part of the persistent checkpoint.

3.1.3. D

ATA

N

ODE

Figure 3.1 HDFS Architecture

Each block replica on a DataNode is represented by two files in the local host’s native file system. The first file contains the data itself and the second file is block’s metadata including checksums for the block data and the block’s generation stamp. The size of the data file equals the actual length of the block and does not require extra space to round it up to the nominal block size as in traditional file systems. Thus, if a block is half full it needs only half of the space of the full block on the local drive.

During startup each DataNode connects to the NameNode and performs a handshake. The purpose of the handshake is to verify the namespace ID and the software version of the DataNode. If either does not match that of the NameNode the DataNode automatically shuts down.

(20)

10 | P a g e

The namespace ID is assigned to the file system instance when it is formatted. The namespace ID is persistently stored on all nodes of the cluster. Nodes with a different namespace ID will not be able to join the cluster, thus preserving the integrity of the file system.

The consistency of software versions is important because incompatible version may cause data corruption or loss, and on large clusters of thousands of machines it is easy to overlook nodes that did not shut down properly prior to the software upgrade or were not available during the upgrade.

A DataNode that is newly initialized and without any namespace ID is permitted to join the cluster and receive the cluster’s namespace ID. After the handshake the DataNode registers with the NameNode. DataNodes persistently store their unique storage IDs. The storage ID is an internal identifier of the DataNode, which makes it recognizable even if it is restarted with a different IP address or port. The storage ID is assigned to the DataNode when it registers with the NameNode for the first time and never changes after that.

A DataNode recognizes block replicas in its possession by sending a block report. Datanodes also sends heartbeats to the Namenode indicating the presence or the proper functioning of the datanode. The NameNode does not directly call DataNodes. It uses replies to heartbeats to send instructions to the DataNodes. The instructions include commands to:

 Replicate blocks to other nodes  Remove local block replicas

 Reregister or to shut down the node  Send an immediate block report

These commands are important for maintaining the overall system integrity and therefore it is critical to keep heartbeats frequent even on big clusters. The NameNode can process thousands of heartbeats per second without affecting other NameNode operations.

3.1.4. HDFS

C

LIENT

User applications[4][5][6][10][11] access the file system using the HDFS client, a code library that exports the HDFS file system interface. Similar to most conventional file systems, HDFS supports operations to read, write and delete files, and operations to create and delete directories. The user references files and directories by paths in the namespace. The user application generally does not need to know that file system metadata and storage are on different servers, or that blocks have multiple replicas.

When an application reads a file, the HDFS client first asks the NameNode for the list of DataNodes that host replicas of the blocks of the file. It then contacts a DataNode directly and requests the transfer of the desired block. When a client writes, it first asks the NameNode to choose DataNodes to host replicas of the first block of the file. The client organizes a pipeline from

(21)

11 | P a g e

node-to-node and sends the data. When the first block is filled, the client requests new DataNodes to be chosen to host replicas of the next block and so on.

Unlike conventional file systems, HDFS provides an API that exposes the locations of a file blocks. This allows applications like the MapReduce framework to schedule a task to where the data are located, thus improving the read performance. It also allows an application to set the replication factor of a file. By default a file’s replication factor is three.

3.1.5. I

MAGE

A

ND

J

OURNAL

The namespace image[6][7] is the file system metadata that describes the organization of application data as directories and files. A persistent record of the image written to disk is called a checkpoint. The journal is a write-ahead commit log for changes to the file system that must be persistent. For each client-initiated transaction, the change is recorded in the journal, and the journal file is flushed and synched before the change is committed to the HDFS client. The checkpoint file is never changed by the NameNode. It is replaced in its entirety when a new checkpoint is created during restart, when requested by the administrator, or by the CheckpointNode described in the next section. During startup the NameNode initializes the namespace image from the checkpoint, and then replays changes from the journal until the image is up-to-date with the last state of the file system. A new checkpoint and empty journal are written back to the storage directories before the NameNode starts serving clients.

If either the checkpoint or the journal is missing, or becomes corrupt, the namespace information will be lost partly or entirely. In order to preserve this critical information HDFS can be configured to store the checkpoint and journal in multiple storage directories. Recommended practice is to place the directories on different volumes, and for one storage directory to be on a remote NFS server. The first choice prevents loss from single volume failures, and the second choice protects against failure of the entire node. If the NameNode encounters an error writing the journal to one of the storage directories it automatically excludes that directory from the list of storage directories. The NameNode automatically shuts itself down if no storage directory is available.

The NameNode is a multithreaded system and processes requests simultaneously from multiple clients. Saving a transaction to disk becomes a bottleneck since all other threads need to wait until the synchronous flush-and-sync procedure initiated by one of them is complete. In order to optimize this process the NameNode batches multiple transactions initiated by different clients. When one of the NameNode’s threads initiates a flush-and-sync operation, all transactions batched at that time are committed together. Remaining threads only need to check that their transactions have been saved and do not need to initiate a flush-and-sync operation.

(22)

12 | P a g e

3.1.6. C

HECK

P

OINT

N

ODE

The NameNode in HDFS, in addition to its primary role serving client requests, can alternatively execute either of two other roles, either a CheckpointNode[3][5][6][7] or a BackupNode[6][7][8]. The role is specified at the node startup.

The CheckpointNode periodically combines the existing checkpoint and journal to create a new checkpoint and an empty journal. The CheckpointNode usually runs on a different host from the NameNode since it has the same memory requirements as the NameNode. It downloads the current checkpoint and journal files from the NameNode, merges them locally, and returns the new checkpoint back to the NameNode. Creating periodic checkpoints is one way to protect the file system metadata. The system can start from the most recent checkpoint if all other persistent copies of the namespace image or journal are unavailable.

Creating a checkpoint lets the NameNode truncate the tail of the journal when the new checkpoint is uploaded to the NameNode. HDFS clusters run for prolonged periods of time without restarts during which the journal constantly grows. If the journal grows very large, the probability of loss or corruption of the journal file increases. Also, a very large journal extends the time required to restart the NameNode. For a large cluster, it takes an hour to process a week-long journal. Good practice is to create a daily checkpoint.

3.1.7. B

ACKUP

N

ODE

A recently introduced feature of HDFS is the BackupNode. Like a CheckpointNode, the BackupNode is capable of creating periodic checkpoints, but in addition it maintains an in-memory, up-to-date image of the file system namespace that is always synchronized with the state of the NameNode.

The BackupNode accepts the journal stream of namespace transactions from the active NameNode, saves them to its own storage directories, and applies these transactions to its own namespace image in memory. The NameNode treats the BackupNode as a journal store the same as it treats journal files in its storage directories. If the NameNode fails, the BackupNode’s image in memory and the checkpoint on disk is a record of the latest namespace state.

The BackupNode can create a checkpoint without downloading checkpoint and journal files from the active NameNode, since it already has an up-to-date namespace image in its memory. This makes the checkpoint process on the BackupNode more efficient as it only needs to save the namespace into its local storage directories.

The BackupNode can be viewed as a read-only NameNode. It contains all file system metadata information except for block locations. It can perform all operations of the regular NameNode that do not involve modification of the namespace or knowledge of block locations. Use of a BackupNode provides the option of running the NameNode without persistent storage, delegating responsibility for the namespace state persisting to the BackupNode.

(23)

13 | P a g e

3.1.8. F

ILE

S

YSTEM

S

NAPSHOT

During software upgrades the possibility of corrupting the system due to software bugs or human mistakes increases. The purpose of creating snapshots in HDFS is to minimize potential damage to the data stored in the system during upgrades.

The snapshot mechanism[8][7][6] lets administrators persistently save the current state of the file system, so that if the upgrade results in data loss or corruption it is possible to rollback the upgrade and return HDFS to the namespace and storage state as they were at the time of the snapshot. The snapshot (only one can exist) is created at the cluster administrator’s option whenever the system is started. If a snapshot is requested, the NameNode first reads the checkpoint and journal files and merges them in memory. Then it writes the new checkpoint and the empty journal to a new location, so that the old checkpoint and journal remain unchanged.

During handshake the NameNode instructs DataNodes whether to create a local snapshot. The local snapshot on the DataNode cannot be created by replicating the data files directories as this will require doubling the storage capacity of every DataNode on the cluster. Instead each DataNode creates a copy of the storage directory and hard links existing block files into it. When the DataNode removes a block it removes only the hard link, and block modifications during appends use the copy-on-write technique. Thus old block replicas remain untouched in their old directories.

The cluster administrator can choose to roll back HDFS to the snapshot state when restarting the system. The NameNode recovers the checkpoint saved when the snapshot was created. DataNodes restore the previously renamed directories and initiate a background process to delete block replicas created after the snapshot was made. Having chosen to roll back, there is no provision to roll forward. The cluster administrator can recover the storage occupied by the snapshot by commanding the system to abandon the snapshot, thus finalizing the software upgrade.

System evolution may lead to a change in the format of the NameNode’s checkpoint and journal files, or in the data representation of block replica files on DataNodes. The layout version identifies the data representation formats, and is persistently stored in the NameNode’s and the DataNodes’ storage directories. During startup each node compares the layout version of the current software with the version stored in its storage directories and automatically converts data from older formats to the newer ones. The conversion requires the mandatory creation of a snapshot when the system restarts with the new software layout version.

HDFS does not separate layout versions for the NameNode and DataNodes because snapshot creation must be an all-cluster effort rather than a node-selective event. If an upgraded NameNode due to a software bug purges its image then backing up only the namespace state still results in total data loss, as the NameNode will not recognize the blocks reported by DataNodes,

(24)

14 | P a g e

and will order their deletion. Rolling back in this case will recover the metadata, but the data itself will be lost. A coordinated snapshot is required to avoid a cataclysmic destruction.

3.2. F

ILE

I/O

O

PERATIONS

A

ND

R

EPLICA

M

ANAGEMENT

3.2.1. F

ILE

R

EAD

A

ND

W

RITE

An application adds data to HDFS by creating a new file and writing the data to it. After the file is closed, the bytes written cannot be altered or removed except that new data can be added to the file by reopening the file for append. HDFS implements a single-writer, multiple-reader model.

The HDFS client that opens a file for writing is granted a lease for the file. No other client can write to the file. The writing client periodically renews the lease by sending a heartbeat to the NameNode. When the file is closed, the lease is revoked. The lease duration is bound by a soft limit and a hard limit. Until the soft limit expires, the writer is certain of exclusive access to the file. If the soft limit expires and the client fails to close the file or renew the lease, another client can preempt the lease. If after the hard limit expires (one hour) and the client has failed to renew the lease, HDFS assumes that the client has quit and will automatically close the file on behalf of the writer, and recover the lease. The writer's lease does not prevent other clients from reading the file. A file may have many concurrent readers.

An HDFS file consists of blocks. When there is a need for a new block, the NameNode allocates a block with a unique block ID and determines a list of DataNodes to host replicas of the block. The DataNodes form a pipeline, the order of which minimizes the total network distance from the client to the last DataNode. Bytes are pushed to the pipeline as a sequence of packets. The bytes that an application writes first buffer at the client side. After a packet buffer is filled (typically 64 KB), the data are pushed to the pipeline. The next packet can be pushed to the pipeline before receiving the acknowledgement for the previous packets. The number of outstanding packets is limited by the outstanding packets window size of the client.

In a cluster of thousands of nodes, failures of a node (most commonly storage faults) are daily occurrences. A replica stored on a DataNode may become corrupted because of faults in memory, disk, or network. HDFS generates and stores checksums for each data block of an HDFS file. Checksums are verified by the HDFS client while reading to help detect any corruption caused either by client, DataNodes, or network. When a client creates an HDFS file, it computes the checksum sequence for each block and sends it to a DataNode along with the data. A DataNode stores checksum in a metadata file separate from the block’s data file. When HDFS reads a file, each block’s data and checksums are shipped to the client. The client computes the checksum for the received data and verifies that the newly computed checksums matches the checksums it received. If not, the client notifies the NameNode of the corrupt replica and then fetches a different replica of the block from another DataNode.

(25)

15 | P a g e

When a client opens a file to read, it fetches the list of blocks and the locations of each block replica from the NameNode. The locations of each block are ordered by their distance from the reader. When reading the content of a block, the client tries the closest replica first. If the read attempt fails, the client tries the next replica in sequence. A read may fail if the target DataNode is unavailable, the node no longer hosts a replica of the block, or the replica is found to be corrupt when checksums are tested.

HDFS permits a client to read a file that is open for writing. When reading a file open for writing, the length of the last block still being written is unknown to the NameNode. In this case, the client asks one of the replicas for the latest length before starting to read its content.

The design of HDFS I/O is particularly optimized for batch processing systems, like MapReduce, which require high throughput for sequential reads and writes. However, many efforts have been put to improve its read/write response time in order to support applications like Scribe that provide real-time data streaming to HDFS, or HBase that provides random, real-time access to large tables.

3.2.2. H

EARTBEAT

A

ND

B

LOCK

R

EPORT

A DataNode identifies block replicas in its possession to the NameNode by sending a block report. A block report contains the block id, the generation stamp and the length for each block replica the server hosts. The first block report is sent immediately after the DataNode registration. Subsequent block reports are sent every hour and provide the NameNode with an up-todate view of where block replicas are located on the cluster.

During normal operation DataNodes send heartbeats to the NameNode to confirm that the DataNode is operating and the block replicas it hosts are available. The default heartbeat interval is three seconds. If the NameNode does not receive a heartbeat from a DataNode in ten minutes the NameNode considers the DataNode to be out of service and the block replicas hosted by that DataNode to be unavailable. The NameNode then schedules creation of new replicas of those blocks on other DataNodes.

Heartbeats[11][12] from a DataNode also carry information about total storage capacity, fraction of storage in use, and the number of data transfers currently in progress[1][2].

3.2.3. S

TAGING

A client request to create a file does not reach the NameNode immediately. In fact, initially the HDFS client caches the file data into a temporary local file. Application writes are transparently redirected to this temporary local file. When the local file accumulates data worth over one HDFS block size, the client contacts the NameNode. The NameNode inserts the file name into the file system hierarchy and allocates a data block for it. The NameNode responds to the client request with the identity of the DataNode and the destination data block. Then the client flushes the block of data from the local temporary file to the specified DataNode. When a file is closed, the

(26)

16 | P a g e

remaining un-flushed data in the temporary local file is transferred to the DataNode. The client then tells the NameNode that the file is closed. At this point, the NameNode commits the file creation operation into a persistent store. If the NameNode dies before the file is closed, the file is lost.

3.2.4. R

EPLICATION

P

IPELINEING

When a client is writing data to an HDFS file, its data is first written to a local file as explained in the previous section. Suppose the HDFS file has a replication factor of three. When the local file accumulates a full block of user data, the client retrieves a list of DataNodes from the NameNode. This list contains the DataNodes that will host a replica of that block. The client then flushes the data block to the first DataNode. The first DataNode starts receiving the data in small portions (4 KB), writes each portion to its local repository and transfers that portion to the second DataNode in the list. The second DataNode, in turn starts receiving each portion of the data block, writes that portion to its repository and then flushes that portion to the third DataNode. Finally, the third DataNode writes the data to its local repository. Thus, a DataNode can be receiving data from the previous one in the pipeline and at the same time forwarding data to the next one in the pipeline. Thus, the data is pipelined from one DataNode to the next [3][5][7].

(27)

17 | P a g e

3.2.5. D

ATA

B

LOCK

P

LACEMENT

HDFS nodes are spread across multiple racks. Nodes of a rack share a switch, and rack switches are connected by one or more core switches. Communication between two nodes in different racks has to go through multiple switches. In most cases, network bandwidth between nodes in the same rack is greater than network bandwidth between nodes in different racks.

Figure 3.3 HDFS Cluster

When a new block is created, HDFS places the first replica on the node where the writer is located, the second and the third replicas on two different nodes in a different rack, and the rest are placed on random nodes with restrictions that n more than one replica is placed at one node and no more than two replicas are placed in the same rack when the number of replicas is less than twice the number of racks.

3.2.6. R

EPLICA

M

ANAGEMENT

The NameNode endeavors to ensure that each block always has the intended number of replicas. The NameNode detects that a block has become under- or over-replicated when a block report from a DataNode arrives. When a block becomes over replicated, the NameNode chooses a replica to remove. The NameNode will prefer not to reduce the number of racks that host replicas, and secondly prefer to remove a replica from the DataNode with the least amount of available disk space. The goal is to balance storage utilization across DataNodes without reducing the block’s availability.

When a block becomes under-replicated, it is put in the replication priority queue. A block with only one replica has the highest priority, while a block with a number of replicas that is greater than two thirds of its replication factor has the lowest priority. A background thread periodically scans the head of the replication queue to decide where to place new replicas. Block replication follows a similar policy as that of the new block placement. If the number of existing replicas is one, HDFS places the next replica on a different rack. In case that the block has two existing replicas, if the two existing replicas are on the same rack, the third replica is placed on a different

(28)

18 | P a g e

rack; otherwise, the third replica is placed on a different node in the same rack as an existing replica. Here the goal is to reduce the cost of creating new replicas.

3.2.7. B

ALANCER

HDFS block placement strategy does not take into account DataNode disk space utilization. This is to avoid placing new—more likely to be referenced—data at a small subset of the DataNodes. Therefore data might not always be placed uniformly across DataNodes. Imbalance also occurs when new nodes are added to the cluster.

The balancer is a tool that balances disk space usage on an HDFS cluster. It takes a threshold value as an input parameter, which is a fraction in the range of (0, 1). A cluster is balanced if for each DataNode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the whole cluster (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value.

The tool is deployed as an application program that can be run by the cluster administrator. It iteratively moves replicas from DataNodes with higher utilization to DataNodes with lower utilization. One key requirement for the balancer is to maintain data availability. When choosing a replica to move and deciding its destination, the balancer guarantees that the decision does not reduce either the number of replicas or the number of racks.

The balancer optimizes the balancing process by minimizing the inter-rack data copying. If the balancer decides that a replica A needs to be moved to a different rack and the destination rack happens to have a replica B of the same block, the data will be copied from replica B instead of replica A.

A second configuration parameter limits the bandwidth consumed by rebalancing operations. The higher the allowed bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes.

3.2.8. B

LOCK

S

CANNER

Each DataNode runs a block scanner that periodically scans its block replicas and verifies that stored checksums match the block data. If a client reads a complete block and checksum verification succeeds, it informs the DataNode. The DataNode treats it as a verification of the replica.

Whenever a read client or a block scanner detects a corrupt block, it notifies the NameNode. The NameNode marks the replica as corrupt, but does not schedule deletion of the replica immediately. Instead, it starts to replicate a good copy of the block. Only when the good replica count reaches the replication factor of the block the corrupt replica is scheduled to be removed [4][5][8].

(29)

19 | P a g e

3.2.9. D

ECOMMISSIONING

The cluster administrator specifies which nodes can join the cluster by listing the host addresses of nodes that are permitted to register and the host addresses of nodes that are not permitted to register. The administrator can command the system to re-evaluate these include and exclude lists. A present member of the cluster that becomes excluded is marked for decommissioning.

Once a DataNode is marked as decommissioning, it will not be selected as the target of replica placement, but it will continue to serve read requests. The NameNode starts to schedule replication of its blocks to other DataNodes. Once the NameNode detects that all blocks on the decommissioning DataNode are replicated, the node enters the decommissioned state. Then it can be safely removed from the cluster without jeopardizing any data availability.

3.2.10. I

NTER

-C

LUSTER

D

ATA

C

OPY

When working with large datasets, copying data into and out of a HDFS cluster is daunting. HDFS provides a tool called DistCp for large inter/intra-cluster parallel copying. It is a MapReduce job; each of the map tasks copies a portion of the source data into the destination file system. The MapReduce framework automatically handles parallel task scheduling, error detection and

(30)

C

HAPTER

4 S

ETTING

U

P

T

HE

H

ADOOP

E

NVIRONMENT

Chapter Gist:

This chapter shows the hadoop environment used in this

experiment i.e. the cluster, number and type of nodes in the cluster, hadoop

(31)

21 | P a g e

4.1. Hadoop Configuration

Hadoop consists of two components: a distributed file system and a Map-Reduce Framework. In first component, there exist two types of nodes, namely a single namenode and several datanodes. Data is stored in datanodes and metadata i.e. file system namespace is stored in namenode. The namenode also manages the replication factor of data blocks. There is a secondary namenode which keeps a copy of the namenode data and is used to restart the namenode in the event of failure. In the second component, there are two processes, namely a Jobtracker and a separate Tasktracker for each datanode. The Jobtracker can be run on a dedicated node or on the namenode. The Tasktracker runs on each datanode. The Jobtracker schedules all jobs in the cluster. A job is split into several tasks which run on datanodes. The Tasktracker is responsible for starting the scheduled tasks on the working nodes (i.e. datanodes) and reporting progress to the Jobtracker.

The Hadoop Distributed File System used in this work has the following configuration.

Hadoop Distributed File System Version 1.0.3 Non-default Parameters hdfs-site.xml dfs.replication slaves Node03-08 masters Node01 Java Version 1.6.0 Description

Java(TM) SE Runtime Environment (build 1.6.0-b105)

Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)

Table 4.1 Hadoop Configuration

4.2. Node Configuration

The Hadoop Distributed File System was setup in 13 nodes. The nodes include a dedicated Namenode, a dedicated Jobtracker, a maximum of 6 datanodes and 5 clients. The following table shows the Node configuration of Hadoop Environment.

(32)

22 | P a g e

Node Configuration

Namenode Single dedicated node

Datanode No of datanodes varies; Max 6 datanodes Jobtracker Single dedicated node

Tasktracker Runs on datanodes

Client No of Clients varies; Max 5 clients

Table 4.2 Node Configuration

(33)

23 | P a g e

4.3. System Configuration

The configuration details specifying the nodes, their hardware, operating system and network configuration has been listed in Table 4.3.

Hardware Configuration

Processor Intel ® Pentium ® Dual Core 2 CPU 3.00 GHz, 2.99 GHz

RAM 504 MB (Minimum)

Hard Disk 40 GB (Minimum)

Operating System

OS Type Linux

OS Version

Linux itl128 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:30:39 EST 2005 i686 i686 i386

GNU/Linux

Distribution Version

LSB Version 1.3

Distributor ID RedHatEnterpriseAS Description Red Hat Enterprise Linux

AS release 4 (Nahant)

Release 4

Codename Nahant

GCC Version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4) Network

Configuration

LAN Ethernet

Bandwidth 100 MBPS

(34)

C

HAPTER

5 H

ADOOP

P

ERFORMANCE

T

UNING

P

ARAMETERS

Chapter Gist:

This chapter discusses about the performance tuning parameters

(a subset of hadoop configuration parameters).

(35)

25 | P a g e

Hadoop Core is designed for running jobs that have large input data sets and medium to large outputs, running on large sets of dissimilar machines. The framework has been heavily optimized for this use case. Hadoop Core is optimized for clusters of heterogeneous machines that are not highly reliable. The HDFS file system is optimized for small numbers of very large files that are accessed sequentially. The Hadoop file system provides several parameters that are tunable i.e. the performance of the HDFS along with its map-reduce framework can be improved by optimizing these parameters. There are different tunable parameters that affect different components of hadoop distributed file systems. Few of them have been discussed below.

5.1. C

LUSTER

-L

EVEL

T

UNABLE

P

ARAMETERS

The cluster-level tunable parameters[3][4] require a cluster restart to take effect. Some of them may require a restart of the HDFS portion of the cluster; others may require a restart of the MapReduce portion of the cluster. These parameters take effect only when the relevant server starts.

5.1.1. S

ERVER

-L

EVEL

P

ARAMETERS

The server-level parameters, shown affect basic behavior of the servers. In general, these affect the number of worker threads, which may improve general responsiveness of the servers with an increase in CPU and memory use.

The variables are generally configured by setting the values in the conf/hadoop-site.xml file. It is possible to set them via command-line options for the servers, either in the conf/hadoop-env.sh file or by setting environment variables (as is done in conf/hadoop-conf/hadoop-env.sh).

The nofile parameter is not a Hadoop configuration parameter. It is an operating system parameter. For users of the bash shell, it may be set or examined via the command ulimit –n [value to set]. Quite often, the operating system-imposed limit is too low, and the administrator must increase that value. The value 64000 is considered a safe minimum for medium-size busy clusters.

Parameters

Description

Default Value

dfs.datanode.handler.count The number of threads servicing

DataNode block requests 3

dfs.namenode.handler.count The number of threads servicing _{Namenode requests} 10 tasktracker.http.threads The number of threads for servicing

map output files to reduce tasks 40 ipc.server.listen.queue.size

The number of network incoming connections that may queue for a

server

(36)

26 | P a g e

nofile

The on the number of file descriptors a process can open (alter /etc/security/limits.con for

Linux machines)

1024

Table 5.1 Server-Level Tuning Parameters

5.1.2. HDFS

T

UNABLE

P

ARAMETERS

The most commonly tuned parameter for HDFS is the file system block size. The default block size is 64MB, specified as 67108864 bytes in dfs.block.size. The larger this value, the fewer individual blocks will be stored on the DataNodes, and the larger the input splits will be.

The DataNodes through at least Hadoop 0.19.0 have a limit to the number of blocks that can be stored. This limit appears to be roughly 500,000 blocks. After this size, the DataNode will start to drop in and out of the cluster. If enough DataNodes are having this problem, the HDFS performance will tend toward full stop.

When computing the number of tasks for a job, a task is created per input split, and input splits are created one per block of each input file by default. There is a maximum rate at which the JobTracker can start tasks, at least through Hadoop 0.19.0. The more tasks to execute, the longer it will take the JobTracker to schedule them, and the longer it will take the TaskTrackers to set up and tear down the tasks.

The other reason for increasing the block size is that on modern machines, an I/O-bound task will read 64MB of data in a small number of seconds, resulting in the ratio of task over-head to task runtime being very large. A downside to increasing this value is that it sets the minimum amount of I/O that must be done to access a single record. If your access patterns are not linearly reading large chunks of data from the file, having a large block size will greatly increase the disk and network loading required to service your I/O.

The DataNode and NameNode parameters are presented in the following Table.

Parameters

Description

Default Value

fs.default.value

The URI of the shared file system. This should be

dfs://NameNodeHostName:PORT

(37)

27 | P a g e

Parameters

Description

Default Value

fs.trash.interval

The interval between trash checkpoints. If 0, the trash feature is disabled. The trash is used only for deletions done via the hadoop dfs -rm series of commands.

0

dfs.hosts

The full path to a file containing the list of hostnames that are allowed to connect to the NameNode. If specified, only the hosts in this file are permitted to connect to the NameNode.

-

dfs.hosts.exclude

A path to file containing a list of hosts to

blacklist from the NameNode. If the file does not exist, no hosts are blacklisted. If a set of

DataNode hostnames are added to this file while the Namenode is running and the command hadoop dfsadmin –refreshNodes is executed, the DataNodes listed will be decommissioned. Any blocks stored on them will be redistributed to other nodes on the cluster such that the default replication for the blocks is satisfied. It is best to have this point to an empty file that exits, so that Datanodes may be decommissioned as needed

-

dfs.namenode.decommissi on.interval

The interval in seconds that the Namenode checks to see if a DataNode decommission has finished.

300

dfs.replication.interval The period in seconds that the NameNode

computes the list of blocks needing replication. 3

dfs.access.time.precision

The precision in msec that access times are maintained. If this value is 0, no access times are maintained. Setting this to 0 may increase performance on busy clusters where the

bottleneck is the namenode edit log write speed.

3600000

dfs.max.objects The maximum number of files, directories and

(38)

28 | P a g e

Parameters

Description

Default Value

dfs.replication

The number of replicas of each block stored in the cluster. Larger values allow more DataNodes to fail before blocks are unavailable but increase the amount of network I/O required to store data and the disk space

requirements. Large values also increase the likelihood that a map task will have a local replica of the input split.

3

dfs.block.size

The basic block size for the file system. This may be too small or too large for your cluster, depending on your job data access patterns.

67108864

dfs.datanode.handler.count

The number of threads handling block requests. Increasing this may increase DataNode throughput, particularly if the DataNode uses multiple separate physical devices for block storage.

3

dfs.replication.considerLoad Consider the DataNode Loading when _{picking replication locations.} True

dfs.datanode.du.reserved

The amount of space that must be kept free in each location used for block storage.

0.0 dfs.permissions Permission checking is enabled for _{file access} True

dfs.df.interval The interval between disk usage

statistics collection in msec. 60000

dfs.blockreport.intervalMsec

The amount of time between block reports. The block report does a scan of every block that is stored on the Datanode and reports this information to the Namenode This reports as of Hadoop 0.19.0 blocks the Datanode from servicing block reports and is the cause of the congestion collapse of HDFS when more 500,000 blocks are stored on a Datanode.

3600000

dfs.heartbeat.interval The heartbeat interval with the

(39)

29 | P a g e

Parameters

Description

Default Value

Dfs.namenode.handler.co unt

The number of server threads for the Namenode. This is commonly greatly increased in busy and large clusters.

10

Dfs.name.dir

The location where the namenode metadata storage is kept. This may be a comma-separated list of directories. A copy will be kept in each location. Writes to the locations are synchronous. If this data is lost, your entire HDFS data set is lost. Keep multiple copies on multiple machines.

${hadoop.tmp.dir}/dfs/ name, in /tmp by

default

Dfs.name.edits.dir

The location where metadata edits are synchronously written. This may be comma-separated list of directories. Ideally, this should hold multiple locations on separate physical devices. If this is lost, your last few minutes of changes will be lost.

${dfs.name.dir}

Dfs.data.dir

The comma-separated list of directories to use for block storage. This list will be used in a round-robin fashion for storing new data blocks. The locations should be on separate physical devices. Using multiple physical devices yields roughly 50% better performance than RAID 0 striping.

${hadoop.tmp.dir}/dfs/ data

Dfs.safemode.threshold.p ct

The percentage of blocks that must be minimally replicated before the HDFS will start accepting write requests. This

condition is examined only on HDFS startup.

0.999f

Dfs.balance.bandwidthPe rSec

The amount of bandwidth that may be used torebalance block storage among

DataNodes. This value is in bytes per second.

1048576

(40)

C

HAPTER

6 P

ERFORMANCE

R

ESULTS AND

A

NALYSIS

Chapter Gist:

This chapter analyzes the parameters (in Chapter 5) with respect

to performance showing a curve or graph that each parameter follows along

(41)

31 | P a g e

6.1. S

CENARIO

1:

E

FFECT OF

M

ULTIPLE

C

LIENTS

No of Clients No of Datanodes Replication Factor Size of the Transferred File (MB) Operation Performed Estimated time 1 6 3 500 hadoop fs – put 56 sec 2 6 3 500 hadoop fs –

put 1 min 33 sec

3 6 3 500 hadoop fs –

put 2 min 14 sec

4 6 3 500 hadoop fs –

put 2 min 54 sec

5 6 3 500 hadoop fs –

put 3 min 36 sec

Table 6.1 Performance Results in scenario 1

6.1.1. P

ERFORMANCE

A

NALYSIS

The above the test result highlights the fact that if the number of clients increases, the time required to write the data will increase. Here, the clients are performing the write operation (i.e. copying 500 MB file from local file system to hadoop distributed file system) concurrently. It tends to produce a straight line as the number of clients increases and is shown below in Figure 6.1.

One can imagine how long it will take to perform a simple write operation when the number of clients is near about 100 or even 1000. So, in order to maintain an acceptable access time for data, we have to maintain the availability of the data blocks in such a way that the data blocks are spread over the network uniformly ( i.e. having almost equal distance in respect to time from each data block replica ).

It can also be shown that the increment in access time doesn’t depend upon the operation performed (i.e. the read time of a file in hadoop distributed file system has also the effect as the write time). There are several options to increase the uniform availability of data over the cluster such as increasing the replication factor, reducing the block size, increasing the server and datablock level threads etc.

(42)

32 | P a g e

Figure 6.1 Effects of Multiple Clients

6.2. S

CENARIO

2:

E

FFECT OF

R

EPLICATION

F

ACTOR

(R

EPLICATION

F

ACTOR

<

N

O

.

OF

A

VAILABLE

D

ATANODES

)

No of Clients No of Datanodes Replication Factor

Size of the Transferred File (MB)

Estimated time

3 1 1 500 2 min 7 sec

3 3 3 500 2 min 33 sec

Table 6.2 Effects of Replication Factor for Write Operation in Scenario 2 No of Clients No of Datanodes Replication Factor

Estimated time

3 1 1 500 3 min 7 sec

3 3 3 500 2 min 56 sec

Table 6.3 Effects of Replication Factor for Read Operation in Scenario 2 0 50 100 150 200 250 0 1 2 3 4 5 6 Es tim at ed Tim e in Se conds No. of Clients

(43)

33 | P a g e

6.2.1. P

ERFORMANCE

A

NALYSIS

The above performance result shows the effect of replication factor where the replication factor is not beyond the number of available datanodes in the cluster. There are two cases: Table 4 shows the effect for write operation and Table 5 for read operation. In the first case, the estimated time of the write operation increases from 2 min 7 sec to 2 min 33 sec as the replication factor along with the number of datanodes increases from 1 to 3, as shown in Table 6.2. This produces a straight line showing increments in write time as the replication factor increases (shown in Figure 6.2).

Figure 6.2 Effects of increasing the Replication Factor (write operation)

In second case, the estimated time tends to be reduced from 3 min 7 sec to 2 min 56 sec as the replication factor along with the no of available working nodes increases, as shown in Table 6.3. The effect of this reduction is shown in Figure 6.1 with a straight line. Thus it clearly reveals the fact that the read time of data can be reduced by increasing the replication factor i.e. having more number of copies of data that is spread over the cluster. Also the fact that there cannot be any increment in write operation is evident from the above result because an increment in replication factor results in extra time required writing the extra copies of data in datanodes.

0 20 40 60 80 100 120 140 160 180 0 0.5 1 1.5 2 2.5 3 3.5 Es tim at ed Tim e in Se conds Replication Factor

(44)

34 | P a g e

Figure 6.3 Effects of increasing the Replication Factor (read operation)

6.3. S

CENARIO

3:

E

FFECT OF

R

EPLICATION

F

ACTOR

(R

EPLICATION

F

ACTOR

>

N

O

.

OF

A

VAILABLE

D

ATANODES

)

No of Clients No of Datanodes Replication Factor

Estimated time

3 3 3 500 2 min 33 sec

3 3 6 500 2 min 46 sec

Table 6.4 Effects of Replication Factor for Write Operation in Scenario 3 No of Clients No of Datanodes Replication Factor

Estimated time

3 3 3 500 2 min 56 sec

3 3 6 500 2 min 57 sec

Table 6.5 Effects of Replication Factor for Read Operation in Scenario 3 174 176 178 180 182 184 186 188 0 0.5 1 1.5 2 2.5 3 3.5 Es tim at ed Tim e in Sec on ds Replication Factor

(45)

35 | P a g e

6.3.1. P

ERFORMANCE

A

NALYSIS

The above performance results show the effect where the replication factor goes beyond the number of datanodes available in the cluster. There are also two cases to consider. One for write operation shown in Table 6.3 and another for read operation as shown in Table 6.4.

In first case, the estimated time for write operation increases from 2 min 33 sec to 2 min 46 sec as the replication factor increases from 3 to 6 as shown Table 6.3. However, in this scenario the number of available working nodes in the cluster is still 3 when the replication factor becomes 6. Figure 6.4 shows a straight line indicating the effect of write time in this situation. The reason of this increment in write time is as exactly same as it was in scenario 2. If the replication factor increases, the no of replicas/copies that must be kept increases resulting in extra time required writing the replicas/copies. Hence, the conclusion that can be drawn from the above test is that the increment in replication factor results in increment of time required writing a file in the distributed file system and this time is independent of the available datanodes running in the cluster.

Figure 6.4 Replication Factor beyond available datanodes (write operation)

In second case (i.e. for read operation), the read time remains almost constant as shown in Table 6.4. It becomes 2 min 57 sec from 2 min 56 sec as the replication increases from 3 to 6. In this case also, the number of available datanodes running in the cluster is made constant i.e. 3. So, this situation clearly reveals the fact that there is no gain of increasing the replication factor beyond the no. of datanodes available. The effect of read operation is shown in Figure 6.5.

152 154 156 158 160 162 164 166 168 0 1 2 3 4 5 6 7 Es tim at ed Tim e in Se conds Replication Factor