• No results found

Project Group High- performance Flexible File System 2010 / 2011

N/A
N/A
Protected

Academic year: 2021

Share "Project Group High- performance Flexible File System 2010 / 2011"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Project  Group    

High-­‐performance  Flexible  File  System  

2010  /  2011  

Lecture  1    

File  Systems  

 

(2)

Task  

Use  disk  drives  to  store  huge  amounts  of  data  

 

Files  

as  logical  resources  

A  file  can  contain  (structured)  data  (i.e.  records)  or  a  set  of  ASCII  

bytes  

We  assume  to  work  on  a  byte  level  

Important:  DisSncSon  between  logical  blocks  of  a  file  and  physical  

blocks  on  storage  media  

File  systems  may  support  

–  Dynamic  sized  files  

–  Mutable  files  

–  Variable  number  of  files  on  a  medium  

(3)

Storage  media  for  files  

Filed  should  be  stored  

on  non-­‐volaSle  media  

with  low  latencies  

and  cheap  costs  and  

allow  read  and  write  accesses  

Today,  magneSc  hard  disk  drives  are  (sSll)  the  most  suitable  

media  

For  small  amounts  of  data:  Floppies,  USB-­‐Flash  

To  archive  huge  amounts  of  data:  Tape  

To  archive  for  read-­‐only  accesses:  CD-­‐ROM,  DVD  

In  niches  (Energy  consumpSon,  robustness,  random  access  read  

performance):  SSD  

In  the  following,  we  will  invesSgate  hard  disk  drives  as  the  most  

(4)

On-­‐disk  format  on  a  HDD  

Datei Inhaltsverzeichnis Datei Plattenettikett Belegungsdarstellung Datei Blocks (Sectors) Cylinder Tracks

(5)

Example  FAT  

•  FAT:  File  AllocaSon  Table  

•  A  FAT-­‐file  system  consists  of  six  parts:    

–  Boot  Sector  

–  Reserved  Sectors  

–  FAT  1:  Table  of    links  of  the  clusters  (see  later  slide)  

–  FAT  2:  Copy  of  the  FAT  

–  Root  Directory:  Table  of  directory  entries  

–  Data  Region  

•  The  boot  sector  contains  executable  x86-­‐machine  code  for  operaSng  system  

start  and  addiSonal  informaSon  about  the  FAT-­‐file  system.  

(6)

Disk  label  

Name  of  the  media  

Date  of  commissioning    

Capacity  

Physical  structure  

Bad  blocks  

Link  to  allocaSon  map  (or  the  map  itself)  

Link  to  root  directory  (or  the  root  directory  itself)  

Stored  on  well-­‐defined  posiSon  (first  block)  and  is  

(7)

AllocaSon  map  (free  and  used  blocks)  

Based  on  vectors  or  tables  

Stored  dense  or  spreaded  

Example:  

Vector  (Bitmap)  for  free  and  used  blocks,  seperated  for  each  area  

(to  reduce  disk  head  movements)  

11000101 10100000

11000000 00000111

11001111 00011000

(8)

AllocaSon  map  in  separate  table  

3 16 22 9 32 10 44 9 57 8 1 2 9 10 17 18 25 26 33 34 41 42 49 50 57 58 3 4 11 12 19 20 27 28 35 36 43 44 51 52 59 60 5 6 13 14 21 22 29 30 37 38 45 46 53 54 61 62 7 8 15 16 23 24 31 32 39 40 47 48 55 56 63 64 Adress (Blocknumber) Length

(9)

Root  directory  (file  catalogue,  file  directory)  

•  The  root  directory  contains  a  list  of  all  stored  files  and  their  descripSon  

•  Flat  directory  structure  

–  In  the  simplest  case,  it  consists  of  a  simple  (one-­‐dimensional)  table  

 

–  For  huge  disks  and  many  files,  the  flat  structure  becomes  unmanageable  (for  

human  users  as  well  as  for  accessing  applicaSons)  

Constant or variable length

(10)

File  directory  

Structured  directories  (tree  abstracSon)  

A B E R S A D T File B File E.A File A.R X Y File A.S.X File A.S.Y X Y File E.D.X File E.D.Y File E.T more blocks Entry of file-catalogue

(11)

File  descripSon  

The  file  descripSon  contains  all  metadata:  

– 

File  name  

– 

Type  of  organizaSon  

– 

Date  of  creaSon  

– 

Owner  

– 

Access  rights  

– 

Time  of  last  access  

– 

Time  of  last  modificaSon  

– 

PosiSon  of  the  file  (parts  of  the  file)  

– 

Size  

– 

...  

(12)

Access  rights  

•  Access  rights  are  set  by  the  owner  (who  is  most  commonly  also  the  creator  of  

the  file)  

•  If  the  access  rights  Read(L)  and  Write  (S)  are  defined,  a  possible  mapping  of  

access  rights  could  be:  

•  More  possible  flags:  

–  Execute  (for  executable  files)  

–  ModificaSon  of  access  rights  (reserved  for  owner)  

–  Writes  split  into  "update"  or  "append"  

–  Delete  

–  Visible  

Datei 1 Datei 2 Datei 3 Datei 4

Benutzer(gruppe) A L,S S

Benutzer(gruppe) B L L,S L L,S

Benutzer(gruppe) C L L

(13)

File  organizaSon  

File  organizaSon  describes  the  inner  structure  of  a  file  

Defines  how  its  blocks  are  accessed  

MulSple  access  types  

– 

SequenSal  

•  blocks  are  accessed  sequenSally  

– 

Direct  

•  ElecSve  access  of  random  blocks  

– 

Index-­‐sequenSal  

•  Both  sequenSal  and  direct  

MulSple  organizaSonal  forms  can  be  provided  at  the  same  

Sme  that  are  mapped  to  a  single  internal  organizaSon  

(14)

SequenSal  File  OrganizaSon  

•  The  blocks  hold  an  internal  sequence  that  determines  the  access  order  

–  Mandatory  organizaSon  form  for  files  on  tape  

–  Can  also  be  used  on  disk  drives  

–  Uses  a  pointer  that  is  moved  explicitly  or  implicitly  

–  An  access  (i.e.  read)  refers  to  the  current  posiSon  of  the  pointer  

•  Most  commonly  there  are  explicit  commands  to  move  pointer:  

–  next      Moves  pointer  to  next  block  

–  previous    Moves  pointer  to  previous  block  (Mostly  non-­‐existent)  

–  reset      Moves  pointer  to  beginning  of  file  

S1 S2 S3 S4 S5 S6 S7 S8 S9 S4‘ Update (in place)

Beginning of the file

old new

EOF (end of file)

(15)

SequenSal  files  on  disk  drives  

On  disk  drives  allow  mulSple  ways  to  store  sequenSal  

files  

– 

ConSguous  

The  file  spans  conSguous  blocks  on  the  disk  

– 

Spreaded  

The  file  uses  arbitrary  blocks  on  the  disk  

– 

Order  and  posiSon  of  of  blocks  can  be  realized  by:  

Chaining  

–  direct  (integrated)  block-­‐chaining  

–  external  chaining  in  a  table  (i.e.  FAT  in  MS-­‐DOS  /  Windows)  

(16)

SequenSal  files  on  disk  drives  

Chaining  

Indexblock  

S1 S2 S3 S4 S5 S6 S7 S8 S9 S1 S2 S3 S4 S5 S6 S7 S8 S9

(17)

Example  

MS-­‐DOS  uses  external  chaining  

– 

Chaining  is  stored  in  File  AllocaSon  Table  (FAT)  –  one  entry  for  

each  block  

– 

For  reasons  of  performance  the  FAT  should  be  hold  in  memory  

„xyz“ … 235 Name 1. Block Directory entry 129 567 EOF 0 129 235 298

(18)
(19)

Direct  File  OrganizaSon  

• 

Direct  access  to  blocks  of  a  file  via  Key  

• 

CalculaSon  of  address  (block  or  track  number)  of  the  block  by  the  key  

 

è

   Hash  funcSon  

ai = f(ki)

,  i.e.  

ai = ki mod n

• 

The  calculated  address  (block  number)  may  not  be  the  physical  block  

number    

• 

An  addiSonal  step  of  mapping  is  possible  

–  Blocks  or  tracks  may  serve  as  containers  for  mulSple  records  that  are  

projected  to  the  same  hashed  address  

–  Only  if  a  container  is  full,  collisions  must  be  resolved  

ki Si

Block Key

(20)

Direct  File  OrganizaSon  

Collision  resoluSon  i.e.  linear  with  

a

i+1

= (a

i

+ d) mod n

… V S S S S S S V V S S V S S S V

a

i

= f(k

i

)

(21)

Direct  File  OrganizaSon  

•  Hash  table  will  fill  up  and  an  overflow  might  occur  

•  Complex  reorganizaSon  (i.e.  by  moving  data)  becomes  necessary  

•  To  avoid  this,  extendible  hashing  could  be  used  

•  Allows  incremental  extension  of  the  hash  table  without  data  movement  

•  Requires  an  addiSonal  step  of  indirecSon  –  the  hashed  projecSon  points  into  

another  vector  of  pointers  

•  Used  hash  funcSon  is  ai = ki mod 2g –  keys  are  discriminated  aber  their  last  g  bits  

•  If  an  overflow  happens,  the  container's  contents  are  redistributed  with  the  

"refined"  hash  funcSon  over  the  old  container  and  a  newly  created  container  

•  To  maintain  a  correct  addressing,  g  is  incremented  by1  (length  of  pointer  vector  

(22)

Example  

•  Before  Extension    (Key  is  43)   •  Aber  Extension   2 2 2 2 b = 2gmax = 4 g max = 2 g Pointer 24 16 92 13 49 22 18 19 15 31 27 Vector of Pointers Vector of Pointers 2 2 2 3 2 2 2 3 b = 2gmax = 8 24 16 92 13 49 22 18 19 27 43 15 31 Data blocks Data blocks

(23)

Index-­‐sequenSal  file  organizaSon  

•  Some  file  are  accessed  both  sequenSal  and  direct  (at  different  points  in  Sme).  

•  This  leads  to  a  mixture  of  sequenSal  and  direct  (indexed)  organizaSon  à  index-­‐

sequenSal  file  organizaSon.  

•  Although  the  blocks  of  the  file  are  stored  sequenSally  on  the  medium,  addiSonal  data  

structures  allow  a  direct  access.  

•  In  its  simplest  form  a  single  step  of  indexing  is  required  where  the  index  stores  the  

largest  key  of  a  block.  

(24)

S4.2 S12.3

IndexsequenSal  file  organizaSon  

Blocks  may  become  empty  or  an  overflow  might  occur  for  dynamic  access  

paherns  (inserSon  and  deleSon  of  blocks)  

–  Overflow  blocks  are  created  and  addiSonal  indexes  are  stored  

S4 S7 S12 S15 S18

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18

(25)

B*-­‐Trees  

• 

The  addiSonal  indexes  for  overflow  blocks  may  drasScally  increase  access  

Smes  for  some  records  

• 

Beher:  Use  dynamic  data  structures  

• 

The  B*-­‐Tree  is  a  variant  of  the  B-­‐Tree  

–   It  holds  the  records  in  the  leaves  

–  Internal  nodes  contain  keys  for  acceleraSon  of  accesses.  

–  Regarding  the  fill  raSon  and  maintenance  of  its  form,  the  B*-­‐Tree  corresponds  to  

the  B-­‐Tree  

41

19 31 71

(26)

ProperSes  of  B*-­‐Trees  

• 

The  nodes  correspond  to  the  blocks  on  the  disk  

• 

Each  node  (block)  is  at  least  filled  halfway  through  

• 

Let  

   

c

i

   

 be  the  number  of  keys  in  an  internal  node  

i

   

m

the  minimal  fill  raSon  of  internal  nodes  (min.    

 

 

 

   

 

 

 number  of  keys)  

   

c

i

*

   

 the  number  of  records  in  a  leaf  node  

i

   

m*

   

 the  minimal  fill  raSon  of  for  leaves  (min.  number  of  records)  

• 

then  it  holds  for  all  internal  nodes  

i

 (except  root):  

 

     

m

c

i

2m

   

 and  for  all  leaves  

i

m*

c

i

2m*

 

(27)

InserSon  in  B*-­‐Tree  

Standard  case:  Space  leb  in  node  

Overflow:  

– 

Neighbor  has  enough  space:  Compensate  with  neighbor  

– 

Neighbors  are  full:  Split  node  (create  a  new  block)  

B*-­‐Tree  aber  inserSon  of  record  with  key  16  (split  node  on  

leave  level,  neighbor  compensaSon  on  level  above)  

31

(28)

DeleSon  in  B*-­‐Tree  

•  Standard  case:  Node  remains  at  least  half-­‐full  

•  ReconfiguraSon  case  (nodes  fill  level  falls  below  half):  

–  Neighbor  more  than  half-­‐full:    Compensate  with  neighbor  

–  Neighbors  half-­‐full:        Merge  with  neighbor  (free  block)  

•  B*-­‐Tree  aber  deleSon  of  record  with  key  71  (node  merge  on  leave-­‐level)  

31

16 19 41

(29)

Depth  of  B*-­‐Trees?  

i.e.  social  insurance  in  China  with  approx.  10

9

 records  

40  bytes  per  record  (key  and  pointer)  and  a  block  size  of  4096  byte  

results  in  a  spreading  factor  of  t  =  4096/40  ≈  100  (number  of  keys  per  

node)  

102 104 106 108 1010

(30)

File  operaSons  

Typical  file  operaSons  

Create  

–  Open   –  Read   –  Write   –  Reset   –  Lock   –  Close   –  Get  ahributes  

–  Set  ahributes  (access  rights)  

(31)

File  control  block  

OperaSons  on  files  require  management  informaSon  

– 

Pointer  to  current  posiSon  

– 

Current  block  address  

– 

Pointer  to  buffers  (in  main  memory)  

– 

Fill-­‐raSo  of  buffers  

– 

InformaSon  about  locks  

This  informaSon  is  stored  in  the  file  control  block  (FCB)  

The  FCB  is  a  data  structure  that  is  created  on  file  opening  

and  is  deleted  when  the  file  is  closed  

A  process  control  block  holds  pointers  to  the  FCBs  of  the  

files  that  were  opened  by  the  process  

(32)

Parallel  file  access  

A  file  may  be  accessed  by  mulSple  processes  in  parallel  

As  the  FCB  contains  both  informaSon  specific  to  the  file  and  

informaSon  specific  to  the  current  user,  some  parts  of  the  FCB  are  

shared  

PCB 1 PCB 2 FCB FCB FCB FCB FCB‘ FCB shared part Shared file

(33)

Buffering  

• 

Some  files  are  accessed  frequently  (i.e.  index  blocks).  To  speed  up  access  

Smes,  disk  blocks  are  buffered  in  main-­‐memory  (disk  cache)  

• 

Some  operaSng  systems  use  all  otherwise  unused  main-­‐memory  as  disk  

cache  (i.e.  Linux)  

• 

Modern  disk  controllers  also  have  internal,  transparent  caches  

• 

Prior  to  each  access  to  a  disk  block,  the  buffer  is  checked  if  the  block  is  

already  cached  

• 

If  the  cache  is  full,  the  same  evicSon  (swapping)  strategies  as  known  from  

virtual  memory  (LRU,  FIFO,  ...)  are  used  

• 

If  a  modified  disk  block  is  stored  in  cache  but  is  not  yet  persisted  to  disk,  a  

system  crash  (or  power  blackout)  results  in  data  loss    

• 

Blocks  that  are  important  for  the  consistency  of  the  file  system  (directory  

blocks,  index  blocks)  should  therefore  be  directly  wrihen  to  disk  

• 

SequenSal  accesses  can  be  exploited  for  buffering:  Read-­‐Ahead  and  Free-­‐

References

Related documents

Nonetheless, individual studies have shown significantly greater improvements from combining CRT with aerobic exercise for various cognitive subdomains (including social

Select using "Strg + I" at the keyboard to view the bookmarks as an Explorer panel Select at the menu → File (Datei) → Import and Export (Importieren und Exportieren).. →

Keywords: human motion detection; machine learning; random forest; KNN; SVM; neural networks; USRP; channel state information; real-time

The exact all-SNR Generalized Log-likelihood Ratio Test (GLRT) of a Gaussian rank-one signal impinging on

• Inodes for files allocated in same cylinder group as file data blocks. • Superblock replicated to improve

+ allows file systems to collect full blocks of data before sending to disk. + File system can send several blocks at once to the disk (delayed write or

Abstract This paper analyzes the sources of first-mover advantages by examining the case of Samsung Electronics, a firm which has maintained and strengthened the

The new file system organization divides a disk partition into one or more areas called cylinder groups. A cylinder group is comprised of one or more consecutive cylinders on a