• No results found

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

N/A
N/A
Protected

Academic year: 2021

Share "CHAPTER 6: DISTRIBUTED FILE SYSTEMS"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

Chapter outline

• DFS design and implementation issues: system structure, file access, and sharing semantics

• Transaction and concurrency control: serializability and concurrency control protocols

• Replicated data management: one-copy serializability and coherency control protocols

Why is DFS important and interesting?

• It is one of the two important components (process and file) in any distributed computation.

• It is a good example for illustrating the concept of transparency and client/server model.

• File sharing and data replication present many interesting research

problems.

(2)

Structure of and access to a file system

read / write files and get / set attributes

system service authorization service directory service

basic transaction service

file

device, cache, and block management concurrency and replication management capability and / or access control list name resolution, add and deletion of files

File Service Interactions

Servers clients

service system service file service authorization

service

directory

(3)

• File: sequential, direct, indexed, indexed-sequential

• File System: flat, hierarchical

File Structures

FCB other file attr

Primary Indirect

other file attr

FAT

Disk Blocks Disk Blocks Disk Blocks

Dir Entry

FAT approach FCB

other file attr

Direct

Sequential Dir Entry

attr other Disk Blocks

Secondary Indirect attr

file other FCB

file

Disk Blocks

(4)

Mounting protocols and NFS

• Explicit mounting

• Boot mounting

• Auto mounting

security

server remote client

local

export

mount

DSM DFS

OS

root

book paper

chow

root

Stateful and stateless file servers

• Opened files and their clients

• File descriptors and file handles

• Current file position pointers

• Mounting information

• Lock status

• Session keys

• Cache or buffer

(5)

Semantics of sharing in DFS

down/up load access

concurrency control

not applicable not applicable ignore sharing coherency

control

coherency control coherency and

concurrency

coherency and concurrency no true sharing

access space

time

simple RW

transaction

session

remote access

cache

sharing

• Unix semantics - currentness

• Transaction semantics - consistency

• Session semantics - efficiency

replication

• Write policies: write-through and write back

• cache coherence control: write-invalidate and write-update

• Version control (immutable files): ignore conflict, resolve version con-

flict, resolve serializability conflict

(6)

Transaction and concurrency control

clients transaction

scheduler object

objects

clients transaction

scheduler object

objects

TM SCH OM

manager manager

manager manager

Coordinator

Participants Communication

transaction processing system (TPS)

• Transaction manager (TM)

• Scheduler (SCH)

• Object manager (OM)

atomicity

• All or none: TM, two-phase commit

• Indivisible (serializable): SCH, concurrency control protocols

• Atomic update: OM, replica management

(7)

Serializability

IF the interleaved execution of transactions is to be equivalent to a serial execution in some order, then all conflicting operations in the interleaved se- rializable schedule must also be executed in the same order at all object sites.

t

0

: bt Write A=100, Write B=20 et

t

1

: bt Read A, Read B, 1: Write sum in C, 2: Write diff in D et t

2

: bt Read A, Read B, 3: Write diff in C, 4: Write sum in D et

Sched Interleave Log in C Log in D Result (C,D) 2PL Timestamp 1 1, 2, 3, 4 W

1

= 120 W

1

= 80 (80, 120) feasible feasible

W

2

= 80 W

2

= 120 consistent

2 3, 4, 1, 2 W

2

= 80 W

2

= 120 (120, 80) feasible t

1

aborts W

1

= 120 W

1

= 80 consistent and restarts 3 1, 3, 2, 4 W

1

= 120 W

1

= 80 (80, 120) not feasible

W

2

= 80 W

2

= 120 consistent feasible

4 3, 1, 4, 2 W

2

= 80 W

2

= 120 (120, 80) not t

1

aborts W

1

= 120 W

1

= 80 consistent feasible and restarts 5 1, 3, 4, 2 W

1

= 120 W

2

= 120 (80, 80) not cascade

W

2

= 80 W

1

= 80 inconsistent feasible aborts 6 3, 1, 2, 4 W

2

= 80 W

1

= 80 (120, 120) not t

1

aborts

W

1

= 120 W

2

= 120 inconsistent feasible and restarts

Banzai Timestamp Protocol

1. Get timestamp T S from TM at bt

2. When performing an operation on object O,

if (read and W R(O) < T S) {do read; RD(O) = max(RS(O), T S)}

if (write and max(RD(O), W R(O) < T S) {do write; W R(O) = T S}

else abort

3. if any writes were performed before aborting, any other transactions

that read the value written by that write must also be aborted

(8)

Concurrency control protocols

Two-phase locking

• locking and shrinking phases of requesting and releasing objeccts

• concurrency versus serializability

• rolling abort, strict two-phase locking

Timestamp ordering

do tentative write abort

wait OK

abort

waiting reads

tentative writes RD

WR

RD

min

min

T min T T

Commit Abort Write Read

abort transactions

cannot commit a waiting read

WR RD

remove transaction from waiting list

WR

• May delay transaction completion (waiting reads)

• Prevents cascading abort (reads wait until tentative write resolved)

(9)

Optimistic concurrency control

• Execution phase

• Validation phase:

1. Validation of transaction t

i

is rejected if T V

i

< T V

k

. All transac- tions must be serialized with respect to T V .

2. Validation of transaction t

i

is accepted if it does not overlap with any t

k

. t

i

is already serialized with respect to t

k

.

3. The execution phase of t

i

overlaps with the update phase of t

k

, and t

k

completes its update phase before T V

i

. Validation of t

i

is accepted if R

i

∩ W

k

= φ.

4. The execution phase of t

i

overlaps with the validation and update phases of t

k

, and t

k

completes its execution phase before T S

i

. Validation of t

i

is accepted if R

i

∩ W

k

= φ and W

i

∩ W

k

= φ.

• Update phase

accept accept if no

r−w & w−w conflict

accept if no r−w conflict TV

TVk

TV TV

TS

TV i

i i i i

reject

execution phase

validation phase

update

phase

(10)

Data and file replication

Architecture of a replica manager

clients

clients

FSA

FSA

RM

RM RM

RM

RM : Replica Manager FSA : File Service Agent

read operations

• Read-one-primary

• Read-one

• Read-quorum

write operation

• Write-one-primary

• Write-all

• Write-all-available

• Write-quorum

• Write-gossip

(11)

One-copy serializability

The execution of transactions on replicated objects is equivalent to the ex- ecution of the same transactions on nonreplicated objects. Some approaches:

• read-one-primary/write-one-primary

• read-one/write-all

• read-one/write-all-available

failures

Failures can cause problems with one-copy serializability. For example:

t

0

: bt W (X) W (Y ) et t

1

: bt R(X) W (Y ) et t

2

: bt R(Y ) W (X) et

t

1

: bt R(X

a

) (Y

d

fails) W (Y

c

) et

t

2

: bt R(Y

d

) (X

a

fails) W (X

b

) et

(12)

Quorum voting

From read-one/write-all-available to read-quorum/write quorum:

1. Write-write conflict: 2 ∗ W (d) > V (d).

2. Read-write conflict : R(d) + W (d) > V (d).

8

8 8

8 6 8

7 7 6

: RMs

: Witnesses

6, 7, 8 are version numbers.

R ( d ) = 5 W ( d ) = 5

Question: What happens if the read contacts a witness (dashed box)

instead of the RM with the actual version 8 copy of the object?

(13)

Gossip update propagation

Lazy update propagation: read-one/write-gossip

• Basic gossip protocol: read/overwrite

• Causal order gossip protocol: read/modify-update

update

Read − modify Read

Gossip Gossip

RM RM RM

V R

V R

V R

log L log L log L

TS < TS f i TS < TS i j

updates FSA

update update update

value value value

Client

r

u r

u r

1 2

= 000

= 100 ,

= 100

= 110 ,

u r

u 1 r

1

= 100 , u = 000

u L : L :

L :

, 100 , 100

, 000

1 2 3 2

1

1 2 3

= 000

= 100 ,

= 100

= 110 ,

= 100

= 101 ,

r

3 2

1

RM RM

RM

Gossip Gossip

V

3 2

1 u u

u

update log update log

update log

= 111

= 000

= 110

= 000

R V R

V

= 100

R

= 100

References

Related documents

• Enhanced generators make these easy to build. Saturday, March

• NSF provides clients transparent access to a remote file system by  letting a client mount (part of) a remote file system into its own local  file system.

• A software application can open files that exist in its native file format, plus several additional file formats. Chapter 4: Operating Systems and File

The Web Server remote also provides a list of files on the device and a list of markers for the currently loaded file.. To load a file on the device, select the file from the list

Jump Lists  • Jump Lists are lists of recently or  frequently opened items, such as files, 

(NOTE: many of the computer labs on campus are mapped to Folder/Cabinet upon login. For example, files stored on the desktop of an Arts110A lab computer are actually stored in

•  To create a new file, writes for: directory i-node, directory block, file i- node, and file’s blocks. –  Low disk efficiency, most of the time gone on seek and

To demonstrate the performance of the Gabor wavelet in en- hancing blood vessels, we also present results of filtering using a single wavelet scale and compare them with results of