Project Group
High-‐performance Flexible File System
2010 / 2011
Lecture 1
File Systems
Task
•
Use disk drives to store huge amounts of data
Files
as logical resources
•
A file can contain (structured) data (i.e. records) or a set of ASCII
bytes
•
We assume to work on a byte level
•
Important: DisSncSon between logical blocks of a file and physical
blocks on storage media
•
File systems may support
– Dynamic sized files
– Mutable files
– Variable number of files on a medium
Storage media for files
•
Filed should be stored
–
on non-‐volaSle media
–
with low latencies
–
and cheap costs and
–
allow read and write accesses
•
Today, magneSc hard disk drives are (sSll) the most suitable
media
–
For small amounts of data: Floppies, USB-‐Flash
–
To archive huge amounts of data: Tape
–
To archive for read-‐only accesses: CD-‐ROM, DVD
–
In niches (Energy consumpSon, robustness, random access read
performance): SSD
•
In the following, we will invesSgate hard disk drives as the most
On-‐disk format on a HDD
Datei Inhaltsverzeichnis Datei Plattenettikett Belegungsdarstellung Datei Blocks (Sectors) Cylinder TracksExample FAT
• FAT: File AllocaSon Table
• A FAT-‐file system consists of six parts:
– Boot Sector
– Reserved Sectors
– FAT 1: Table of links of the clusters (see later slide)
– FAT 2: Copy of the FAT
– Root Directory: Table of directory entries
– Data Region
• The boot sector contains executable x86-‐machine code for operaSng system
start and addiSonal informaSon about the FAT-‐file system.
Disk label
•
Name of the media
•
Date of commissioning
•
Capacity
•
Physical structure
•
Bad blocks
•
Link to allocaSon map (or the map itself)
•
Link to root directory (or the root directory itself)
•
Stored on well-‐defined posiSon (first block) and is
AllocaSon map (free and used blocks)
•
Based on vectors or tables
•
Stored dense or spreaded
Example:
•
Vector (Bitmap) for free and used blocks, seperated for each area
(to reduce disk head movements)
11000101 10100000
11000000 00000111
11001111 00011000
AllocaSon map in separate table
3 16 22 9 32 10 44 9 57 8 1 2 9 10 17 18 25 26 33 34 41 42 49 50 57 58 3 4 11 12 19 20 27 28 35 36 43 44 51 52 59 60 5 6 13 14 21 22 29 30 37 38 45 46 53 54 61 62 7 8 15 16 23 24 31 32 39 40 47 48 55 56 63 64 Adress (Blocknumber) LengthRoot directory (file catalogue, file directory)
• The root directory contains a list of all stored files and their descripSon
• Flat directory structure
– In the simplest case, it consists of a simple (one-‐dimensional) table
– For huge disks and many files, the flat structure becomes unmanageable (for
human users as well as for accessing applicaSons)
Constant or variable length
File directory
•
Structured directories (tree abstracSon)
A B E R S A D T File B File E.A File A.R X Y File A.S.X File A.S.Y X Y File E.D.X File E.D.Y File E.T more blocks Entry of file-catalogue
File descripSon
•
The file descripSon contains all metadata:
–
File name
–
Type of organizaSon
–
Date of creaSon
–
Owner
–
Access rights
–
Time of last access
–
Time of last modificaSon
–
PosiSon of the file (parts of the file)
–
Size
–
...
Access rights
• Access rights are set by the owner (who is most commonly also the creator of
the file)
• If the access rights Read(L) and Write (S) are defined, a possible mapping of
access rights could be:
• More possible flags:
– Execute (for executable files)
– ModificaSon of access rights (reserved for owner)
– Writes split into "update" or "append"
– Delete
– Visible
Datei 1 Datei 2 Datei 3 Datei 4
Benutzer(gruppe) A L,S S
Benutzer(gruppe) B L L,S L L,S
Benutzer(gruppe) C L L
File organizaSon
•
File organizaSon describes the inner structure of a file
•
Defines how its blocks are accessed
•
MulSple access types
–
SequenSal
• blocks are accessed sequenSally
–
Direct
• ElecSve access of random blocks
–
Index-‐sequenSal
• Both sequenSal and direct
•
MulSple organizaSonal forms can be provided at the same
Sme that are mapped to a single internal organizaSon
SequenSal File OrganizaSon
• The blocks hold an internal sequence that determines the access order
– Mandatory organizaSon form for files on tape
– Can also be used on disk drives
– Uses a pointer that is moved explicitly or implicitly
– An access (i.e. read) refers to the current posiSon of the pointer
• Most commonly there are explicit commands to move pointer:
– next Moves pointer to next block
– previous Moves pointer to previous block (Mostly non-‐existent)
– reset Moves pointer to beginning of file
S1 S2 S3 S4 S5 S6 S7 S8 S9 S4‘ Update (in place)
Beginning of the file
old new
EOF (end of file)
SequenSal files on disk drives
•
On disk drives allow mulSple ways to store sequenSal
files
–
ConSguous
•
The file spans conSguous blocks on the disk
–
Spreaded
•
The file uses arbitrary blocks on the disk
–
Order and posiSon of of blocks can be realized by:
•
Chaining
– direct (integrated) block-‐chaining
– external chaining in a table (i.e. FAT in MS-‐DOS / Windows)
SequenSal files on disk drives
•
Chaining
•
Indexblock
S1 S2 S3 S4 S5 S6 S7 S8 S9 S1 S2 S3 S4 S5 S6 S7 S8 S9Example
•
MS-‐DOS uses external chaining
–
Chaining is stored in File AllocaSon Table (FAT) – one entry for
each block
–
For reasons of performance the FAT should be hold in memory
„xyz“ … 235 Name 1. Block Directory entry 129 567 EOF 0 129 235 298
Direct File OrganizaSon
•
Direct access to blocks of a file via Key
•
CalculaSon of address (block or track number) of the block by the key
è
Hash funcSon
ai = f(ki), i.e.
ai = ki mod n•
The calculated address (block number) may not be the physical block
number
•
An addiSonal step of mapping is possible
– Blocks or tracks may serve as containers for mulSple records that are
projected to the same hashed address
– Only if a container is full, collisions must be resolved
ki Si
Block Key
Direct File OrganizaSon
•
Collision resoluSon i.e. linear with
a
i+1= (a
i+ d) mod n
… V S S S S S S V V S S V S S S Va
i= f(k
i)
Direct File OrganizaSon
• Hash table will fill up and an overflow might occur
• Complex reorganizaSon (i.e. by moving data) becomes necessary
• To avoid this, extendible hashing could be used
• Allows incremental extension of the hash table without data movement
• Requires an addiSonal step of indirecSon – the hashed projecSon points into
another vector of pointers
• Used hash funcSon is ai = ki mod 2g – keys are discriminated aber their last g bits
• If an overflow happens, the container's contents are redistributed with the
"refined" hash funcSon over the old container and a newly created container
• To maintain a correct addressing, g is incremented by1 (length of pointer vector
Example
• Before Extension (Key is 43) • Aber Extension 2 2 2 2 b = 2gmax = 4 g max = 2 g Pointer 24 16 92 13 49 22 18 19 15 31 27 Vector of Pointers Vector of Pointers 2 2 2 3 2 2 2 3 b = 2gmax = 8 24 16 92 13 49 22 18 19 27 43 15 31 Data blocks Data blocksIndex-‐sequenSal file organizaSon
• Some file are accessed both sequenSal and direct (at different points in Sme).
• This leads to a mixture of sequenSal and direct (indexed) organizaSon à index-‐
sequenSal file organizaSon.
• Although the blocks of the file are stored sequenSally on the medium, addiSonal data
structures allow a direct access.
• In its simplest form a single step of indexing is required where the index stores the
largest key of a block.
S4.2 S12.3
IndexsequenSal file organizaSon
•
Blocks may become empty or an overflow might occur for dynamic access
paherns (inserSon and deleSon of blocks)
– Overflow blocks are created and addiSonal indexes are stored
S4 S7 S12 S15 S18
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18
B*-‐Trees
•
The addiSonal indexes for overflow blocks may drasScally increase access
Smes for some records
•
Beher: Use dynamic data structures
•
The B*-‐Tree is a variant of the B-‐Tree
– It holds the records in the leaves
– Internal nodes contain keys for acceleraSon of accesses.
– Regarding the fill raSon and maintenance of its form, the B*-‐Tree corresponds to
the B-‐Tree
41
19 31 71
ProperSes of B*-‐Trees
•
The nodes correspond to the blocks on the disk
•
Each node (block) is at least filled halfway through
•
Let
c
ibe the number of keys in an internal node
i
m
the minimal fill raSon of internal nodes (min.
number of keys)
c
i*
the number of records in a leaf node
i
m*
the minimal fill raSon of for leaves (min. number of records)
•
then it holds for all internal nodes
i
(except root):
m
≤
c
i≤
2m
and for all leaves
i
m*
≤
c
i≤
2m*
InserSon in B*-‐Tree
•
Standard case: Space leb in node
•
Overflow:
–
Neighbor has enough space: Compensate with neighbor
–
Neighbors are full: Split node (create a new block)
•
B*-‐Tree aber inserSon of record with key 16 (split node on
leave level, neighbor compensaSon on level above)
31
DeleSon in B*-‐Tree
• Standard case: Node remains at least half-‐full
• ReconfiguraSon case (nodes fill level falls below half):
– Neighbor more than half-‐full: Compensate with neighbor
– Neighbors half-‐full: Merge with neighbor (free block)
• B*-‐Tree aber deleSon of record with key 71 (node merge on leave-‐level)
31
16 19 41
Depth of B*-‐Trees?
•
i.e. social insurance in China with approx. 10
9records
•
40 bytes per record (key and pointer) and a block size of 4096 byte
results in a spreading factor of t = 4096/40 ≈ 100 (number of keys per
node)
102 104 106 108 1010File operaSons
Typical file operaSons
•
Create
– Open – Read – Write – Reset – Lock – Close – Get ahributes– Set ahributes (access rights)