Distributed File Systems
Part I
Daniel A. Menascé
Issues in Centralized File
Systems
• File Naming c:\courses\cs571\procs.ps (MS-DOS) /usr/menasce/courses/cs571/processes.ps (UNIX) • File Structure bitstream or bytestreamrecord oriented (record = key + data) indexed (e.g., B*-trees (IBM VSAM) )
B*-Tree Files
... ... ... ... ... ... ... ... ... ... ≤ > > index nodes leaf nodes a bIssues in Centralized File
Systems
• File Types
text (e.g., ASCII)
binary (e.g., executables, images, etc.) • Directory Structures
flat
hierarchical (tree) graph
Directories
menasce courses CS571 papers INFS601 intro.ps procs.ps ... intro.ps procs.ps ... grcs571.xls grinfs601.xls menasce courses CS571 papers INFS601 intro.ps procs.ps ... grcs571.xls grinfs601.xls hierarchical graphDirectories
menasce courses CS571 papers INFS601 intro.ps procs.ps ... intro.ps procs.ps ... hierarchical ~menasce/courses/CS571/intro.ps ~menasce/courses/INFS601/intro.psDirectories
menasce courses CS571 papers INFS601 intro.ps procs.ps ... grcs571.xls grinfs601.xls graph ~menasce/courses/CS571/intro.ps ~menasce/courses/INFS601/intro.psIssues in Centralized File
Systems
• Allocation of File to Disk Blocks contiguous
linked indexed
Contiguous Allocation of File to Disk
Blocks
... ... 101102103 150 0 1 2 49 start address = 101 no. of used blocks = 3 last reserved block = 150• simple mapping • bad use of disk space • hard to expand if maximum
allocation is exceeded
Linked Allocation of File to Disk Blocks
0 1 2
154 35 237
first block address = 154 last block address = 237 number of blocks = 3
• good use of disk space
• bad performance for direct access (e.g. reading the k-th block requires
reading k blocks) directory info
Indexed Allocation of File to Disk
Blocks
0 1 2 3 511 154 35 237 -1 -1.
.
.
154 35 237 (index in main memory) disk• efficient direct access • good use of disk space • inadequate for very large
files (very large index).
UNIX I-node
item type (e.g., file, directory) item size in bytes
time the file’s inode was last modified time the file’s contents was last modified
time the file was last accessed reference count: number of file names
file’s owner (a UID) file’s group (a GID) file’s mode bits (r,w,x)
UNIX Directories
. . . foo bar notes docnotes and doc are the same file
I-node Allocation of File to Disk Blocks
0 1 2 3 511 I-node file attributes 510 509 ... ... ... ... ... ... ... ... ... ... ... ... SIP DIP TIPSIP= single indirect pointer DIP= double indirect pointer TIP= triple indirect pointer
I-node Allocation of File to Disk Blocks
• Efficient access to data blocks of small (from i-node), medium (from single indirect blocks), large (from double indirect blocks), and huge (from triple indirect block) files.• Maximum file size (assuming 512 byte blocks and 4 bytes per pointer):
(120+128+128**2+128***3) * 512 ≈ 1 GByte
Security in Centralized Systems
• What is security?
• Storing protection data. • UNIX File Protection. • Authentication methods.
What Is Security?
• Confidentiality: protecting information from being read or copied by unauthorized users. • Data Integrity: protecting information from
being deleted or altered without permission. • Availability: avoiding denial of service.
• Access Control: controlling who has access to the system.
• Accountability: keeping track of unauthorized accesses on an audit trail.
Storing Protection Data
• SeCurity
Protection Matrix
Access Control Lists Capabilities
usr1
usr n
file 1 file 2 file m
...
rw r rwx
-Access Control Lists and
Capabilities
usr1
usr n
file 1 file 2 file m
...
rw r rwx
- rw
-capabilities: list of objects
and access rights per user.
access control list: list of users
and access rights per object.
UNIX Protection Model
usr1
usr n
file 1 file 2 file m
...
rw r rwx
- rw
-access control list: list of users
• UNIX implements a coarse grain version of ACLs.
• Users are divided into three groups:
- owner - group - world
Protection Bits for Files
drwx--S--- 2 menasce 512 Nov 4 13:49 grades/
-rw-rw-r-- 1 menasce 684 Nov 4 13:48 project_ideas
-rw--- 1 menasce 509 Nov 4 13:48 student_mail
-rw-r--r-- 1 menasce 3063 Nov 4 13:49 syllabus
entry type (- file; d directory) owner rights
group rights other’s rights
Authentication Methods
• Something that you know: password. • Something that you have: a card key. • Something that you are: fingerprint • Combination:
– card key and password – card key and weight
Passwords
• Passwords are stored in password files
(/etc/passwd in UNIX) in an encrypted form (one-way encryption).
• Users should select hard to crack passwords:
– Use combinations of lower and upper case
characters, punctuation signs (!$#?;:), and numbers. – Good password: A$1c;:mE
– Bad password: sunshine
– Easy to remember: base password on a phrase. – Change passwords regularly
Users, User IDs and the Superuser
• Every user in UNIX has a username and a user identifier (UID) which is a number.
• Common “users” in UNIX systems:
– root: superuser performs accounting and low-level functions.
– daemon: handles network aspects – agent: handles e-mail
– guest: for visitors – ftp: for anonymous ftp.
Groups and Group Identifiers
• Every UNIX user belongs to one or more groups.
• Groups have a group name and a group ID (GID).
• Each user belongs to the primary group stored in the /etc/passwd file
• All groups are listed in the /etc/group file in UNIX
Groups and Group Identifiers
users group (gid 104)
ftp group (gid 10) admin group (gid 0)
student group (gid 40)
root john mary peter susan jill ftp
The Superuser
• Every UNIX system has a special user with UI = 0 and usually called root.
• root is used by the OS to accomplish its basic functions
• root has access to all system resources!
• More than one user can be the superuser (they just need to have UID = 0).
• The superuser is the main security weakness in UNIX.
Distributed File Systems
• File Service Interface: - upload/download model
client server
get file put file
- entire files are retrieved from the server, and accessed at the client.
- once the client is done, the file is stored back at the server.
Distributed File Systems
• File Service Interface: - remote access model
client server
read block write block
- only the needed blocks of files are retrieved from the server.
- once the client is done with a block, it is written back to the server.
- example: NFS
Distributed File Systems: directory
service interface
file server 1: file server 2: A B C D E F A B C D E F root A B C D E F root at client 1 at client 2Distributed File Systems: directory
service interface
file server 1: file server 2: A B C D E F A B C D E F root A B C D E F root at client 1 at client 2Distributed File Systems: naming
• Location transparency: the path name does notreveal the file location.
e.g.: /serverA/dir1/dir2/x does not say where the server is located.
• Location independence: files can be moved and all references to them continue to be valid.
Distributed File Systems: two-level
naming
• Symbolic Names: human readable. e.g.: /courses/slides/files.ps
• Binary names: machine readable names. Easier to manipulate.
e.g.: UNIX i-node, or
server IP address:i-node number
• Symbolic to binary name mapping may be one to many
in a distributed system (file replication).
Semantics of File Sharing
• UNIX semantics: used in centralized systems.- a read that follows a write sees the value written by the write.
time x write x’ to block a ⇒ x’ x’ read block a t1 t2 get x’
Semantics of File Sharing
• UNIX semantics:- a read that follows two writes in quick succession sees the result of the last write.
x write x’ to block a ⇒ x’ read block a t1 t2 t3 x’’ x’ ⇒ x’’ write x’’ to block a get x’’
Semantics of File Sharing
Issues in Distributed File Systems
• Single File Server - No client caching- easy to implement UNIX semantics • Client File Caching
- improves performance by decreasing demand at the server
- updates to the cached file are not seen by other clients.
Semantics of File Sharing
• Session Semantics: (relaxed semantics)- changes to an open file are only visible to the process that modified the file.
- when the file is closed, changes are visible to other processes closed file is sent
back to the server. ⇒
Semantics of File Sharing
• Session Semantics:- what if two or more clients are caching and modifying a file?
• final result depends on who closes last • use an arbitrary rule to decide who wins. - file pointer sharing not possible when a
process and its children run on different machines
Semantics of File Sharing
• No File Updates Semantics:- files are never updated.
- allowed file operations: CREATE and READ. - files are atomically replaced in the directory. - Problem: what if two clients want to replace
a file at the same time?
• take the last one or use any non-deterministic rule.
Semantics of File Sharing
• Transaction Semantics:- all file changes are delimited by a Begin and End transaction.
- all file requests within the transaction are carried out in order.
- the complete transaction is either carried out completely or not at all (atomicity).
Semantics of File Sharing
UNIX Semantics every operation is instantly visible to others
Session Semantics no changes visible until file is closed. No Updates Semantics no file updates are
allowed.