Network File Systems
André Brinkmann
Agenda
•
Network File Systems
–
Distributed File Systems
–
NFS
–
AFS
•
Network A<ached Storage
Distributed / Network File System
•
A distributed file system or network file system is any file system
that allows access to files from mulAple hosts sharing via a
computer network.
–
Possibility for mulAple users on mulAple machines to share files and
resources
•
Client nodes do not have direct access to the underlying block
storage but interact over the network using a protocol
–
Possibility to restrict access to the file system depending on access lists or
capabiliAes on both the servers and the clients
•
In contrast, in a shared disk file system all nodes have equal access
to the block storage where the file system is located.
–
Access control must reside on the client
CharacterisAcs
Distributed file systems should support the following demands:
•
Every user can access his files from every host
è
Access Transparency
•
MulAple users can access the same file at the same Ame
è
Concurrency Transparency
•
Files of a user can be stored at different hosts
è
LocaAon Transparency und LocaAon Independence
•
Files can be replicated within the system
è
ReplicaAon Transparency
CharacterisAcs
•
Files can be migrated within the system while a user is
accessing it
è
MigraAon Transparency
•
The file system has to conAnue to work correctly in case
of the failure of a client, server, or the loss of a message
è
Failure Transparency
•
The client can access the file server at sufficient
performance even in case of varying load
è
Performance Transparency
è
Performance should be comparable to local file system
AddiAonal Demands
•
Distributed file system should support a broad set
of plaQorms
è
Hardware and OperaAng System Heterogeneity
•
A distributed file system should support thousands
of clients
è
Scalability
File Server Architecture
Server OperaAons
Flat file service
Read(FileId, i, n)
è
Data
Write(FileId, i, Data)
Create()
è
FileId
Delete(FileId)
GetA5ributes(FileId)
è
A5r
SetA5ributes(FileId, A5r)
•
No open/close-‐funcAon
•
FuncAons are idempotent
(Ausnahme: create)
Directory services
Lookup(Dir, Name)
è FileId
AddName(Dir, Name, File)
UnName(Dir, Name)
GetNames(Dir, Pattern)
è NameSeq
Position of first byte
FileId: Eindeutiger Identifier für Datei innerhalb des Netzwerkes.
FileId
Lookup of a path name:
Path names, like /usr/bin/tar‚ can be resolved by iteratively calling lookup(). Every component of the path requires one call, while the ID of the root directory has to be known in advance.
File System as Protocol State Machine
•
Access to a file can be seen as a session of operaAons on file or
directories
•
The distributed file system can be seen as a specialized
communicaAon system with storage related protocols
–
Opening a file: Building up a session
–
Accessing a file: Data transfer
–
Change of a<ributes: Management operaAons
–
Closing a file: Closing a session
•
Protocols can be described by “State Machines“, where the
transiAon between states depends on acAons
•
Protocols can be symmetric or asymmetric
Oblivious and non-‐oblivious file systems
•
File system has to keep status informaAon about current state
•
State informaAon includes, e.g.,:
– InformaAon about open files and their accessing clients
– File descriptors and UFIDs
– PosiAon pointer for the current access
– MounAng informaAon
– Lock status
– Session IDs and their capabiliAes
– Caches or buffers
•
The status informaAon can be distributed among a client module, directory
server and file server
•
UFIDs, buffers, and session keys should be, from a performance perspecAve, kept
at the client
•
Status informaAon is different from a<ributes, which are stored as part of the
file. Instead, they are dynamic informaAon belonging to a session
Oblivious and non-‐oblivious file systems
•
DistribuAon of status informaAon significantly influences properAes of the
distributed file system, especially performance, scalability, availability, and
consistency
•
Distributed file system is called “non-‐oblivious” if it stores dynamic session
informaAon, otherwise it is called “oblivious” if all session informaAon is stored
at the client
•
Tradeoffs
– Flexibility
A distributed file system can be<er manage and opAmize its task if it has more informaAon
– Simplicity
An oblivious file system is easier to implement and more reliable handling failures of clients and/or servers
Problems of oblivious File Systems
•
Handling of reoccurring operaAons of a client a]er a failure:
– Are all operaAon idempotent?
– Typically, the kernel of a distributed system ensures that no recurrence occurs or that
recurrences can be detected based on a numbering scheme. In the second case, the file system has to handle numbering and looses its obliviousness.
•
Handling of file locking and concurrency control
– Can be handled by a transacAon manager or session manager
– Problem is only moved from file system to another system component
•
Session keys to manage security
– Session keys are typically managed on both sides of a session. Again, security
management can be handled by a higher level security manager
•
How to handle cache consistency and consistency of replicates?
•
All exisAng DFSs have some (minimal) memory
Example: SUN NFS
•
NFS (Network File System) is an open protocol to exchange files
•
Developed by Sun Microsystems and published as NFS Version 2 (RFC 1094) in
1989
•
Simple and open interfaces
•
Supports the following properAes of distributed file systems
– transparency
– heterogeneity (limited for Windows)
– efficiency
– fault tolerance
•
The following properAes are only parAally supported
– concurrency
– replicaAon
– consistency
– security
NFS v3 Architecture
Where should NFS be implemented?
•
NFS has not to be implemented in OS kernel space
–
NFS can also be implemented as library or user space process
–
Examples: Windows and Mac OS implementaAons, PocketPC
•
Advantages of a kernel space implementaAon under UNIX
–
Programs are not forced to be recompiled for different NFS implementaAons
to adapt the library
• Systems calls for remote files can be routed directly to the NFS module
–
Cache for currently used files of the OS
–
Kernel level server can directly access inodes and data blocks
• Also possible for privileged user space process
–
Security of the encrypAon
NFS v3 operaAons
• read(:, offset, count) è a5r, data
• write(:, offset, count, data) è a5r
• create(dir:, name, a5r) è new:, a5r
• remove(dir:, name) status
• geta5r(:) è a5r
• seta5r(:, a5r) è a5r
• lookup(dir:, name) è :, a5r
• rename(dir:, name, todir:, toname)
• link(newdir:, newname, dir:, name)
• readdir(dir:, cookie, count) è entries
• symlink(newdir:, newname, string) è
status
• readlink(:) è string
• mkdir(dir:, name, a5r) è new:, a5r
• rmdir(dir:, name) è status
• staEs(:) è fsstats
fh = file handle
Filesystem Identifier inode-number inode-generation
Model flat file service:
Read(FileId, i, n) -> Data Write(FileId, i, Data) Create() -> FileId Delete(FileId) GetAttributes(FileId) -> Attr SetAttributes(FileId, Attr)
Model directory services:
Lookup(Dir, Name) è FileId
AddName(Dir, Name, File) UnName(Dir, Name)
GetNames(Dir, Pattern)
èNameSeq
NFS v3 AuthorizaAon and AuthenAcaAon
•
AuthenAcaAon is done on client
–
Joint authenAcaAon files necessary
•
NFS Version 3 is stateless, so that the idenAty and the access
permissions are checked on each access to the server
–
Local file systems check the permission only during the open call
•
Each access of a client is enhanced by the user and group
informaAon
–
InformaAon is inserted in RPC call
•
Server can be a<acked if user and group informaAon is not secured
by an encrypAon scheme
–
Kerberos can be integrated directly into NFS
Mount Service
•
Mount-‐OperaAon:
– mount(remotehost, remotedirectory,
localdirectory)
•
Server holds table with clients that have mounted a file
system of the server
•
Each client maintains a table with all mounted file
systems
– <IP address, port number, file handle>
Local and remote File Systems with NFS clients
•
The directory
students
on the client is equal to the
people
directory on server 1
•
The directory
staff
on the client is equal to the
users
directory on
Automounter
•
NFS Client tries to find „empty“ mount points and delegates them to the
automounter
•
Automounter has a table of mount points and mulAple candidates
•
Automounter sends requests to candidates and mounts the file system from the
server that answers first
– mount tables are kept smaller
– Simple form of replicaAon of read-‐only file systems
•
Example (simplified):
– Each user has the /home directory
– Automounter installs the directory during the booAng of the client
– If user accesses the directory mustermann , it is requested by the automounter on
different NFS servers
– If different servers have idenAcal /usr/lib, each server has a chance of being
accessed
NFS v3 opAmizaAon: Server Caching
•
Similar to UNIX file caching of local files
– Pages (Blocks) of the hard disk are kept in memory unAl replaced by newer pages
– OpAmizaAons for read-‐ahead and delayed-‐write
•
NFS v3 provides two strategies to write files
– write-‐through: Dirty pages are immediately wri<en to a storage system when they
arrive at the server. When the write()-‐RPC returns, the NFS client has the guarantee that data is on disk
– delayed commit: Pages are kept in a cache unAl a commit()-‐call (Default for NFS
v3). A commit() is done by a client, e.g. when a file is closed.
•
AddiAonal reply cache:
– Logs previous requests
– When a client sends a request mulAple Ames, the cache already stores correct
NFS v3 opAmizaAon: Client Caching
•
Caching not explicit in protocol
•
Server caching does not reduce network traffic between clients and server
– Further opAmizaAon necessary to reduce traffic in large networks
– NFS client stores results for read, write, geta<r, lookup and readdir operaAons
– SynchronizaAon of file contents is not ensured if two or more clients access same file
•
ValidaAon based on Amestamps
– Reduces inconsistencies, but does not eliminate them
– ValidaAon properAes for cache entries of the client:
• (T - Tc < t) ∨ (Tmclient = Tmserver)
– t can be configured (per file) and is typically set to 3 seconds for files and 30 seconds for directories
– WriAng of distributed applicaAons with NFS
is sAll difficult
t: Refresh period
Tc: Time of last validation Tm: Time of last update T: Current time
NFS v3 Summary I
•
NFS is an example for a simple, robust, and distributed file system
•
EvaluaAon of properAes
–
Access transparency: very good; API can use same UNIX syntax for local and
remote accesses
–
locaAon transparency: Not ensured; Names of file systems are specified
during mount operaAon. Good system configuraAon can achieve
transparency
–
Concurrency: Limited; If mulAple clients write to a single file, consistency is
not ensured
–
ReplicaAon: Limited to read-‐only file systems
–
Fault tolerance: Limited; Service is down when server crashes. A]er Restart
the server is restored using stateless protocol
NFS v3 Summary II
–
Data mobility: barely possible; Transparent file moves are not possible.
Complete file systems can be moved, but this needs a change in the client
configuraAon
–
Performance: ?
–
Scalability: ?
NFS v3 è NFS v4
•
NFS v4 standardized under RFC 3530
•
UNIX user and group numbers are replaced by strings, e.g.
username@hostname
•
IntroducAon of temporary file handles that exists only for limited Ame
•
Mount and lock protocols are parts of the protocol itself and run on a
specified port. Firewalls usage with NFS connecAons is made simpler
•
MulAple requests can be combined into a single call. They are
executed by the server and only a single response is sent back.
Protocol can be made more efficient on Wide Area Networks
•
EncrypAon is part of the spec. SecureRPC has been possible before,
but it was only seldom used, because it was not always available
•
The Lookup call is replaced by open. Write/Read posiAon is
maintained on the server
•
If many clients only read a file, this file can be given to all clients. If a
client wants to write a file, an exclusive lock can be granted
Andrew File System (AFS)
•
CMU study validated the following assumpAons
– Files are typically small
– Read operaAons dominate write operaAons by a factor of 6
– SequenAal access are typical, random accesses are seldom
– Files are typically read or wri<en by a single user. In case of mulAple user, the file is
only changed by one of them.
– Files are referenced in bursts (high probability that a file is accessed shortly a]er its
current usage)
Ø
File usage is local and caches are valid for a long period of Ame
Ø
AFS uses client side caches to overcome network bo<leneck
•
The following slides only present a very small overview about AFS …
AFS Cache Consistency
•
AFS introduces an object
callback promise
•
Every cached file of a client has a callback promise CP
•
AFS server also holds one callback promise for each cached file
•
Status of CP is
valid
or
cancelled
•
OperaAons:
– Server transfers CP to client on a remote open. CP is stored together with the file on
the client side
– Client has to test status of CP on open
– CP has to be renewed a]er Ame T, which is a free parameter
– Server sends status cancelled to all clients holding the CP on a file update
– Client has to reload file if the CP is in the state cancelled
– Client has to reload CPs from server in case of a reboot
•
All updates are performed locally
•
Last local update overwrites other updated on the server
AFS Extensions
•
DCE Distributed File System (DFS)
•
DCE is an acronym for Distributed CompuAng Environment
•
Consistency a]er updates:
–
A Client, which wants to update a file, has to get write token from server
–
Write token includes posiAon and number of bytes, which should be updated
–
Server cancels permissions of other clients for that region a]er gesng write
request
–
Similar tokens are introduced to change a<ributes
–
All tokens expire a]er their life Ame and have to be renewed
Network A<ached Storage I
•
File servers offer access to storage
systems over distributed file system
protocols
– NFS, AFS, CIFS
•
ApplicaAons on client access
distributed file system over network
•
Client interface is moved from block
level to file system level
•
What is a NAS?
– Pre-‐configured file server, which is
a<ached to a local area network over Ethernet
Single hard drive, which can be accessed over Ethernet and CIFS / NFS is already a NAS
Network A<ached Storage II
•
„A term used to refer to storage elements that connect to a
network and provide file access services to computer systems.
Abbreviated NAS. A NAS Storage Element consists of
an engine
,
which implements the file services, and
one or more devices, on
which data is stored
. NAS elements may be a<ached to any type of
network. When a<ached to SANs, NAS elements may be
considered to be members of the SAS (SAN a<ached storage) class
of storage elements.“
•
„A class of systems that provide file services to host computers. A
host system that uses network a<ached storage uses a file system
device driver to access data using file access protocols such as NFS
or CIFS. NAS systems interpret these commands and perform the
internal file and device I/O operaAons necessary to execute them.”
NAS Server
•
NAS server is a pre-‐configured file server that offers
access to its storage systems via Ethernet
3 U
SCSI Hard Disk/ RAID
Ethernet GBit/s Ethernet Fast Ethernet Server Client CIFS / NFS NAS-Server
NAS Gateway
•
NAS gateway is able to a<ach external storage
systems via a SAN interface
FC/iSCSI Ethernet GBit/s Ethernet Fast Ethernet Server Client CIFS / NFS NAS-Gateway SAN 3 U 3 U 3 U 3 U