• No results found

CS4500/5500 Opera-ng Systems Distributed and Parallel File Systems

N/A
N/A
Protected

Academic year: 2021

Share "CS4500/5500 Opera-ng Systems Distributed and Parallel File Systems"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

CS4500/5500  

Opera-ng  Systems

 

Distributed  and  Parallel  File  Systems  

 

Jia  Rao  

Department  of  Computer  Science  

(2)

Recap  of  Previous  Classes

File  systems  provide  an  abstrac0on  of  permanently  

stored  data  

o 

Namespace:  files  and  directories  

} 

Translate  paths  to  corresponding  loca0ons  on  disks    

o 

Space  management  and  op0miza0ons  

} 

Free  blocks  

} 

Caching  and  prefetching  

(3)

Distributed  and  Parallel  File  Systems  

Provide  similar  abstrac0ons  of  data  on  

mul$ple

 machines  

o

Namespace:  path  name  

à

 machine  ID:  disk  block  address  

o

Management:  placement  of  files  on  machines  

} 

Replica0on  

} 

Striping  

(4)

Distributed  v.s.  Parallel  File  Systems  

Design  objec0ves  

o 

Fault-­‐tolerance  v.s.  Concurrent  performance  

Data  distribu0on  

o 

En0re  file  on  a  single  node  v.s.  striping  over  mul0  nodes  

 Symmetry  

o 

Storage  co-­‐located  with  apps  v.s.  storage  separated  from  apps  

Fault-­‐tolerance  

o 

Designed  for  fault-­‐tolerance  v.s.  relying  on  enterprise  storage  

Workload  

o 

Loosely  coupled,  distributed  apps  v.s.  coordinated  HPC  apps  

(5)

Examples  

Distributed  File  Systems  

o

NFS,  GFS  (Google  File  System),  HDFS  (Hadoop  Distributed  

File  System),  GlusterFS  

Parallel  File  Systems  

(6)

Design  Issues  (1)  

Nameserver

 

o 

maps  file  names  to  objects  (files,  directories,  blocks)  

o 

Implementa0on  op0ons  

} 

Single  name  Server  

¨ 

Simple  implementa0on,  reliability  and  performance  issues  

} 

Several  Name  Servers  (on  different  hosts)  

(7)

Design  Issues  (2)  

Caching  

o

Caching  at  the  client:

 Main  memory  vs.  Disk  

o 

 

Cache  consistency

 

} 

Server  ini0ated  

¨ 

Server  informs  cache  managers  when  data  in  client  caches  is  stale  

¨ 

Client  cache  managers  invalidate  stale  data  or  retrieve  new  data  

¨ 

Disadvantage:  extensive  communica0on  

} 

 Client  ini0ated  

¨ 

Cache  managers  at  the  clients  validate  data  with  server  before  returning  it  to  

clients  

¨ 

Disadvantage:  extensive  communica0on  

} 

Prohibit  file  caching  when  concurrent-­‐wri0ng  

¨ 

Several  clients  open  a  file,  at  least  one  of  them  for  wri0ng  

¨ 

Server  informs  all  clients  to  purge  that  cached  file  

(8)

Design  Issues  (3)  

Update  (write)  policy  

o 

Once  a  client  writes  into  a  file  (and  the  local  cache),  when  should  the  

modified  cache  be  sent  to  the  server?  

} 

Write-­‐through:  all  writes  at  the  clients,  immediately  transferred  to  

the  servers  

¨ 

Advantage:  reliability  

¨ 

Disadvantage:  performance,  it  does  not  take  advantage  of  the  cache  

} 

Delayed  wri0ng:  delay  transfer  to  servers  

¨ 

Advantages:    

¨ 

Many  writes  take  place  (including  intermediate  results)  before  a  

transfer  

¨ 

Some  data  may  be  lost  

¨ 

Disadvantage:  reliability  

} 

Delayed  wri0ng  un0l  file  is  closed  at  client  

¨ 

For  short  open  intervals,  same  as  delayed  wri0ng    

(9)

Design  Issues  (4)  

Availability  

o 

What  is  the  level  of  availability  of  files  in  a  distributed  file  system?  

o 

Use  replica0on  to  increase  availability,  i.e.  many  copies  (replicas)  of  

files  are  maintained  at  different  sites/servers  

o 

Replica0on  issues:  

} 

How  to  keep  replicas  consistent  

(10)

Design  Issues  (5)  

Scalability  

o 

Deal  with  a  growing  system?  

o 

Issues  

} 

Node  join  and  leave  (fail)  

} 

Cache  consistency  

} 

Nameserver  

o 

Solu0ons  

} 

Replica0on  

} 

Design  cache  consistency  protocol  for  scalability  

} 

Mul0ple  name  (meta)  servers  

(11)

Example  -­‐  GlusterFS  (DFS)  

Client-­‐1

Client-­‐2

Client-­‐N

Gluster  Virtual  Storage  Pool  

(built  on  donated  par00ons  on  each  machine)

Gluster  Global  Namespace  (Gluster  Na-ve)  

IP  network

(12)

Example  –  GlusterFS  (2)  

Three  ways  to  place  files  

o

Distribute:  place  en0re  files  on  different  servers  

} 

Pros:  good  scalability,  efficient  disk  space  usage  

} 

Cons:  poor  reliability  

o

Replicate:  place  iden0cal  copies  of  files  on  different  servers  

} 

Pros:  reliability  

} 

Cons:  wasted  disk  space,  moderate  scalability  

o

Stripe:  place  only  part  of  a  file  on  one  server  

} 

Pros:  good  performance  for  concurrent  and  random  access  

(13)
(14)

Example  –  PVFS  (PFS)  

Significant  improvement  in  throughput  

What  could  be  the  issues?  

1.

Sever  coordina0on  affects  efficiency  

(15)

GFS  and  PFS  in  the  Cloud  (1)  

Both  approaches  provide  cheap,  reliable  and  

(16)

GFS  and  PFS  in  the  Cloud  (2)  

(17)

Some  Real  Results…  

Host  a  8-­‐VM  Hadoop  cluster  on  8  DELL  machines  

Performed  micro  and  real  I/O  intensive  workloads  

Two  storage  solu0ons:  PVFS  and  local  ext3  

PVFS  

Local  ext3  

Gridmix  websort  20GB  data   2391  second   4693  second  

    16k   32k   64k   256k   1M   Sequen0al   58.89   60.15   60.47   104.80   130.47   random   12.34   20.84   33.51   50.43   108.71       16k   32k   64k   256k   1M   Sequen0al   120.11   120.56   120.39   120.39   120.57  

PVFS  

Local  ext3  

Network  bandwidth  

boCleneck  

References

Related documents

This study compared the influence of neighborhood disorganization and influence of friendship or peer groups on delinquent behaviors among students enrolled in Juvenile Delinquency

To start programming in Python, we have to learn how to type the source code and save it to a le, using a text editor program.. We also need to know how to open a Command Terminal

The  PuertoTerm  database  focuses  on  the  conceptual  structure,  interrelations, 

Since Grid systems are considered as a natural development step of parallel and distributed systems, existing middleware and communication libraries used in parallel and

Our analysis focused on identifying brain regions involved in decision-making (conservative vs. liberal strategies) when participants were both exposed to a financial payoff

FIGURE 2: Diagram of prediction of amyloid deposition and further prediction of conversion of mild cognitive impairment (MCI) individuals to Alzheimer disease (AD) in the sample

The Post Earthquake Investigation Committee of the Technical Council on Lifeline Earthquake Engineering (TCLEE), a technical council of the American Society of Civil Engineers

L. arenarum ocuparon la mayor cantidad de sitios. El ambiente semitemporal presentó la mayor diversidad y la menor variación en la abundancia entre especies, mientras que el