• No results found

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

N/A
N/A
Protected

Academic year: 2022

Share "COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Edgar Gabriel

COSC 6374

Parallel Computation Parallel I/O (I) –

I/O basics

Edgar Gabriel Spring 2008

Concept of a clusters

Compute node

message passing network administrative network

Memory

Processor 1

Processor 2

Network card 1Network card 2 local disks

(2)

COSC 6374 – Parallel Computation Edgar Gabriel

I/O Problem (I)

• Every node has its own local disk – no “globally visible” file-system

• Most applications require data and executable to be locally available

– e.g. an MPI application using 100 nodes requires the executable to be available on all nodes in the same directory using the same name

I/O problem (II)

Current processor performance: e.g. Pentium 4 – 3 GHz ~ 6GFLOPS

– Memory Bandwidth: 133 MHz * 4 * 64Bit ~ 4.26 GB/s

Current network performance:

– Gigabit Ethernet: latency ~ 40 µs, bandwidth=125MB/s – InfiniBand 4x: latency ~ 5 µs, bandwidth =1GB/s

Disc performance:

– Latency: 7-12 ms

– Bandwidth: ~20MB/sec – 60 MB/sec

(3)

COSC 6374 – Parallel Computation Edgar Gabriel

Basic characteristics of storage devices

• Capacity: amount of data a device can store

• Transfer rate or bandwidth: amount of data at which a device can read/write in a certain amount of time

• Access time or latency: delay before the first byte is moved

Prefix Abbreviation Base ten Base two

kilo, kibi K, Ki 10^3 2^10=1024

Mega, mebi M, Mi 10^6 2^20

Giga, gibi G, Gi 10^9 2^30

Tera, tebi T, Ti 10^12 2^40

Peta, pebi P, Pi 10^15 2^50

UNIX File Access Model (I)

• A File is a sequence of bytes

• When a program opens a file, the file system establishes a file pointer. The file pointer is an integer indicating the position in the file, where the next byte will be

written/read.

• Multiple processes can open a file concurrently. Each process will have its own file pointer.

• No conflicts occur, when multiple processes read the same file.

• If several processes write at the same location, most UNIX file systems guarantee sequential consistency. (The data from one of the processes will be available in the file, but not a mixture of several processes).

(4)

COSC 6374 – Parallel Computation Edgar Gabriel

UNIX File Access Model (II)

• Disk drives read and write data in fixed-sized units (disk sectors)

• File systems allocate space in blocks, which is a fixed number of contiguous disk sectors.

• In UNIX based file systems, the blocks that hold data are listed in an inode. An inode contains the

information needed to find all the blocks that belong to a file.

• If a file is too large and an inode can not hold the whole list of blocks, intermediate nodes (indirect blocks) are introduced.

Write operations

• Write:

– the file systems copies bytes from the user buffer into system buffer.

– If buffer filled up, system sends data to disk

• System buffering

+ allows file systems to collect full blocks of data before sending to disk

+ File system can send several blocks at once to the disk (delayed write or write behind)

- Data not really saved in the case of a system crash

- For very large write operations, the additional copy from user to system buffer could/should be avoided

(5)

COSC 6374 – Parallel Computation Edgar Gabriel

Read operations

• Read:

– File system determines, which blocks contain requested data

– Read blocks from disk into system buffer

– Copy data from system buffer into user memory

• System buffering:

+ file system always reads a full block (file caching)

+ If application reads data sequentially, prefetching (read ahead) can improve performance

- Prefetching harmful to the performance, if application has a random access pattern.

File system operations

• Caching and buffering improve performance – Avoiding repeated access to the same block – Allowing a file system to smooth out I/O behavior

• Non-blocking I/O gives users control over prefetching and delayed writing

– Initiate read/write operations as soon as possible – Wait for the finishing of the read/write operations just

when absolutely necessary.

(6)

COSC 6374 – Parallel Computation Edgar Gabriel

Distributed File Systems vs. Parallel File Systems

• Offer access to a collection of files on remote machines

• Typically client-server based approach

• Transparent for the user

• Concurrent access to the same file from several processes is considered to be an unlikely event

– in contrary to parallel file systems, where it is considered to be a standard operation

• Distributed file systems assume different numbers of processors than parallel file systems

• Distributed file systems have different security requirements than parallel file systems

NFS – Network File System

• Protocol for a remote file service

• Client – server based approach

• Stateless server (v3)

• Communication based on RPC (Remote Procedure Call)

• NFS provides session semantics – changes to an open file are initially only visible to the process that modified the file

• File locking not part of NFS protocol (v3)

• File locking handled by a separate protocol/daemon – Locking of blocks often supported

• Client caching not part of the NFS protocol (v3) – depending on implementation

– E.g. allowing cached data to be stale for 30 seconds

(7)

COSC 6374 – Parallel Computation Edgar Gabriel

NFS in a cluster

Front-end node hosts the file server

NFS in a cluster (II)

• All file operations are remote operations – file server (= NFS server) = bottleneck

• Extensive usage of file locking required to implement sequential consistency of UNIX I/O

• Communication between client and server typically uses the “slow” communication channel on a cluster

• Do we use several disks at all ?

• Some inefficiencies in the specification, e.g. a read operation involves two RPC operations

– Lookup file-handle – Read request

(8)

COSC 6374 – Parallel Computation Edgar Gabriel

Parallel I/O

Basic idea: disk striping

• Stripe factor: number of disks

• Stripe depth: size of each block

Disk striping

• Requirements for improving disk performance:

– Multiple physical disks

– Separate I/O channels to each disk – Data transfer to all disks simultaneously

• Problem of simple disk striping:

– Minimum stripe depth (sector size) required for optimal disk performance

• since file size is limited, the number of disks which can be used in parallel is limited as well

– Loss of a single disk makes entire file useless

• Risk to loose a disk is proportional to the number of disks used

• RAID (Redundant Arrays of Independent Disks – see lecture 2)

(9)

COSC 6374 – Parallel Computation Edgar Gabriel

Parallel File Systems

• Goals

– Several process should be able to access the same file concurrently

– Several process should be able to access the same file efficiently

• Problems

– Unix sequential consistency semantics – Handling of file-pointers

– Caching and buffering

Concurrent file access – logical view

• Number of compute and I/O nodes need not match 1 1 2 2 3 3 4 4 Blocks from compute nodes

1 2 3 4 1 2 3 4 Logical view (“shared file”)

1 3 1 3 2 4 2 4 I/O nodes

Disks

(10)

COSC 6374 – Parallel Computation Edgar Gabriel

Concurrent file access – opening a file

• Each I/O node has a subset of the blocks

• File system needs to look up where the file resides – Each I/O node maintains its own directory information or – Centralized name service

• File system needs to look up striping factor (often fixed)

• Creating a new file

– file systems has to choose different I/O nodes for holding the first block to avoid contention

Concurrent write operations

• How to ensure sequential consistency ? – File locking

• Prevents parallelism even if processes write to different locations in the same file (false sharing) – Better: locking of individual blocks

• Parallel file systems often offer two consistency models – Sequential consistency

– A relaxed consistency model

• application is responsible for preventing overlapping write-operations

(11)

COSC 6374 – Parallel Computation Edgar Gabriel

File pointers

• In UNIX: every process has a separate file pointer (individual file pointers)

• Shared file pointers often useful (e.g. reading the next piece of work, writing a parallel log-file)

– On distributed memory machines: slow, since somebody has to coordinate the file pointer

– Can be fast on shared memory machines – General problems:

• file pointer atomicity

• Non blocking I/O

• Explicit file offset operations: each process tells the file system where to read/write in the file

– no update to file pointers!

Buffering and caching

• Client buffering: buffering at compute nodes

– Consistency problems (e.g. one node writes, another tries to read the same data)

• Server buffering: buffering at I/O nodes

– Prevents concatenating several small requests to a single large one => produces lots of traffic

(12)

COSC 6374 – Parallel Computation Edgar Gabriel

Example for a parallel file system: xFS

• Anderson et all., 1995

• Storage server: storing parts of a file

• Metadata manager: keeps track of data blocks

• Client: processes user requests

Client Manager

Manager Storage

server

Storage server

Storage server

xFS continued

• Communication based on active messages – Uses fast networking infrastructure

• Log-based file system

– Modifications to a file are written to a log-file and collectively written to disk

– To find a data block, a separate table (imap) holds inode references to the position in the log-file

– Log-file is distributed among several processes using RAID techniques

• Storage servers are organized in stripe groups

– I.e. not all storage servers are participating in all operations

– A globally replicated table stores which server is belonging to which stripe group

• Each file has a manager associated to it

• Manager map: identifies manager for a specific file

(13)

COSC 6374 – Parallel Computation Edgar Gabriel

xFS continued again

• Starting point: file, data

• Directory returns file id

• Manager map returns metadata manager

• Metadata manager returns exact location of inode in the log: stripe group id, segment id and offset in segment

• Client computes on which server the block really is

Directory

Manager map

fid File, data

imap

Stripe group map

Client caching in xFS

• xFS maintains a local block cache – Based on block caching

– Request of write permission transfers the ownership – Manager keeps track where a file block is cached

• Collaborative caching

– Manager transfer most recent version of a data block directly from one cache into another cache

(14)

COSC 6374 – Parallel Computation Edgar Gabriel

xFS versus NFS

Issue NFS v3 xFS

Design goals Access transparency Server-less system

Access model Remote Log-based

Communication RPC Active msgs.

Client process Thin Fat

Server groups No Yes

Name space Per client Global

Sharing semantics Session UNIX

Caching unit Implementation dep. Block Fault tolerance Reliable

communication

Striping

Summary

• Parallel I/O is a means to decrease the file I/O access times on parallel applications

• Performance relevant factors – Stripe factor

– Stripe depth

– Buffering and caching – Non-blocking I/O

• Parallel file systems offer support for concurrent access of processes to the same file

– Individual file pointers – Shared file pointers – Explicit offset

• Distributed file systems are a poor replacement for parallel file systems

References

Related documents

Ward & Daniel (2006) belyser at i forkant av dette gjennomføres det gjerne en separat evaluering av selve systemimplementeringen. I prosjektet hos Kunden AS ble

Enrollment from 7/1/11-6/30/12: Campus Number of Students Admitted New Starts since 6/30/12 Re-Enrollments since 6/30/12 Transferred to Program from Another Program since

Whilst these models of the various growth stages of successful small enterprises contain aspects which are indeed descriptive of how businesses develop, they over look the

Apeles, where the party insisting on the presumption of regularity of a notarized deed of sale admitted that the same was notarized without his presence, this

It has been shown, for instance, that many characteristics of the individual multiplex layers, such as their degree distributions, clustering coef fi cients, or the probabilities

What is the most likely explanation for the abnormal pulmonary function

Drawing extensively from archival sources, it argues that the split between the pragmatists from both communities, prepared to compromise their core principles

In the large head category, metal-on-metal total hips have the highest revision rates, while crosslinked (modified) polyethylene has the lowest revision rate, but the results are