• No results found

CS 5523 Operating Systems: Intro to Distributed Systems

N/A
N/A
Protected

Academic year: 2021

Share "CS 5523 Operating Systems: Intro to Distributed Systems"

Copied!
56
0
0

Loading.... (view fulltext now)

Full text

(1)

CS 5523 Operating Systems:

Intro to Distributed Systems

Instructor: Dr. Tongping Liu

Thank Dr. Dakai Zhu, Dr. Palden Lama for providing their slides.

(2)

2

Outline

Different Distributed Systems

Ø Distributed computing systems Ø Distributed information systems Ø Distributed pervasive systems

OS in distributed systems

Ø Distributed OS vs. Network OS vs. Middleware

Design objectives of distributed systems

Ø Transparency, openness and scalability

Architecture of distributed systems

Ø Software vs. system architectures

(3)

Computer System Revolution

Computers"

"

"

large/expensive

à

small/cheap"

Networks: "

"LAN à WAN, " " "bps à Kbps à Gbps"

Now, it is easy to put together many computers. Why? "

Ø Solve problems"

Ø Share resources"

Ø Increase collaboration"

(4)

4 intranet ISP desktop computer: backbone satellite link server: network link: ☎ ☎ ☎

What are

Distributed

Systems?

CS5523: Operating System @ UTSA

A collection of

networked

independent computers

that appears to its users as a

(5)

5

Different Types of Distributed Systems

Distributed Computing Systems: HP computing task

Ø Cluster computing: similar components

Ø Grid computing / Cloud computing: different components

Distributed Information Systems

Ø Web servers

Ø Distributed database applications

Distributed Pervasive Systems: instable

Ø Smart home systems

Ø Electronic health systems: monitor

Ø Sensor networks: surveillance systems

(6)

6

Cluster Computing Systems

High-performance computing

Ø a group of high-end systems connected through a LAN Ø Homogeneous: same OS, near-identical hardware

Ø Single managing node

(7)

7

Grid Computing Systems

Lots of nodes from everywhere

share resources and collaborate"

Ø Heterogeneous!

Ø Dispersed across several organizations"

Ø Can easily span a wide-area network"

To allow for collaborations, grids generally use

virtual organizations. "

Ø Same organization: a grouping of users (or better: their IDs) have the same access rights"

Ø The key questions are "

ü Authorize users from different administrative domains"

ü Provide authorized users with the access to specific resources "

(8)

8

Grid Computing Systems (cont.)

❚  Application: "

Ø Use the grid computing environment"

❚  Collective layer: "

Ø Handles access to multiple resources (resource discovery, allocation and scheduling, data replication)"

❚  Connectivity layer: "

Ø Communication protocols "

(transfer data across resources, access a remote resource, security)""

❚  Resource layer: "

Ø Manage a single resource "

(create a process, access control)"

❚  Fabric layer: "

Ø Interface to local resources (query, locking)"

(9)

Cloud Computing Systems

❚  “Cloud computing has become another buzzword after Web 2.0.”"

❚  “We won’t compute on local computers, but on centralized facilities operated by third-party compute and storage utilities”""

❚  “There are dozens of different definitions for cloud computing and there seems to be no consensus on what a cloud is.” Here is one definition:! !A large-scale distributed computing paradigm that is driven by

economies of scale, in which a pool of abstracted, virtualized,

dynamically-scalable, managed computing power, storage,

platforms, and services are delivered on demand to external customers

over the Internet.”"

❚  “Cloud computing is not a completely new concept; it has intricate

connection to the relatively new but thirteen-year established grid computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general.” "

CS5523: Operating System @ UTSA 9

I. Foster, Cloud Computing and Grid Computing 360-Degree Compared, Grid Computing Environments Workshop, 2008. GCE '08

(10)

DIS: Distributed Information Systems

Organizations have legacy networked applications,

but it is hard to make them interoperate "

Middleware can help "

Integration can take place at several levels"

Ø Client-servers wrap a number of requests into one and have it executed as a Distributed Transaction (all or none of requests would be executed)"

Ø Applications can be detached from their databases and may need to directly communicate with each other:

(11)

DIS: Transaction Processing Systems

Database applications, transactions properties (ACID) "

Atomic: A transaction happens indivisibly; "

Ø All operations either succeed, or all of them fail; "

Consistent: Don’t violate system invariants "

Ø Internal transfer: total money should be the same"

Ø Intermediate states may violate if not visible outside"

Isolated (serializable): Concurrent transactions do

not interfere with each other"

Durable: Once a transaction commits, the changes

are permanent "

(12)

DIS: Distributed Database Transactions

n 

Transaction

Processing

Monitor: coordinate

the execution of a

transaction

(sub-transactions) when

data is distributed

across servers

middleware

(13)

13

DIS: Enterprise Information Systems

Inter-application Communication"

Ø RPC (Remote Procedure Calls) or" RMI (Remote Method Invocations)" ü both applications must be up and running"

ü know exactly how to refer to each other"

Ø Message-Oriented Middleware (MOM)"

ü send data to a logical contact"

ü publish/subscribe"

CS5523: Operating System @ UTSA

(14)

14

DPS: Distributed Pervasive Systems

DCS and DIS: stable distributed systems (fixed

nodes good connections)"

Unstable with mobile and embedded devices "

Distributed Pervasive Systems:"

Ø Computing anywhere and anytime!

Ø Contextual change: environment changes should be

immediately react."

Ø Ad hoc composition: Each node may be used in a very

different ways by different users. Requires ease-of-configuration."

Ø Sharing is the default: Nodes come and go, providing

sharable services and information. Calls again for simplicity."

CS5523: Operating System @ UTSA

(15)

15

DPS: Electronic Health Care Systems

Devices are physically close to a person

Ø Where and how should monitored data be stored? Ø How can we prevent loss of crucial data?

Ø How can security be enforced?

Ø How can physicians provide online feedback?

(16)

16

DPS: Sensor Networks

CS5523: Operating System @ UTSA

The nodes to which sensors are attached are:

n  Many (10s-1000s)

n  Simple (small memory/compute/communication capacity)

(17)

Distributed Systems: Definition

A distributed system (DS) is a piece of software that

ensures that "

a collection of independent computers, communicate through network by passing messages!

And appears to its users as a single coherent system" "

"

But HOW can we"

Ø hide the differences between independent computers &"

Ø provide a single system view?" "

(18)

18

OS Structures in Distributed Systems

Distributed Operating Systems: OS essentially

tries to maintain a single, global view of the

resources it manages (Tightly-coupled OS).

Network Operating Systems: Collection of

independent OSes augmented by network services

(Loosely-coupled OS)

Middleware: Provide a level of transparency

between applications and local OSes

(19)

19

Distributed Operating Systems

Full transparency: users feel a big system and are

not aware of multiple different machines

Access to remote services similar to local resources

(20)

20

Distributed Operating Systems (Cont.)

Each node has its own kernel for managing local

resources (memory, local CPU, disk, …)

Common software layer implements OS and supports

parallel and concurrent execution of various tasks.

Possible complete software implementation of

shared memory

(distributed shared memory)

Additional facilities:

❚  task assignments, handle hardware failures, transparent

storage, inter-process communication, data/computation/ process migration.

(21)

21

Network Operating Systems (NOS)

NO single view of the distributed system

Users are aware of the multiplicity of the machines

Applications use

NOS services

to access resources

(22)

22

Network Operating Systems (Cont.)

❚  NOS provide facilities to allow users to make use of services

in other machines

Ø Remote login (telnet, rlogin)

Ø File transfer (ftp)

❚  The only communication is message passing

❚  Users need to explicitly log on into remote machines, or

copy files from one machine to another.

❚  Need multiple passwords, multiple access permissions.

❚  In contrast, adding or removing a machine is relatively simple.

(23)

23

Middleware-Based Systems

Middleware: a higher level of abstraction on top of

network operating systems for efficient transparency

(24)

24

Middleware-Based Systems (cont.)

Each local system is a part of underlying NOS

Target of

middleware

:

Ø

Resolve the integration problems of various

networked applications

Examples: Remote Procedure Calls (RPC),

Remote Method Invocations (RMI),

Distributed File Systems, Distributed Object

Systems

(25)

25

Design Objectives of Distributed Systems

Goal of distributed systems

Ø Make resources (hardware/software) available

Ø Make it easy for users to access remote resources

Ø Share resources in a controlled and efficient way

Distribution transparency: hiding distribution

Openness

Scalability

(26)

26

Distribution Transparency

Access Hides differences in data representation and

invocation mechanisms

Location Hides where a resource resides

Migration Hides that a resource may move

Relocation Hides that a resource may be moved while in

use

Replication Hides that a resource is replicated

Concurrency Hides that other users may access the same

resource

Failure Hides failure and possible recovery of resources

(27)

27

Degree of Transparency

❚  Full transparency will can be costly

❚  Expose distribution of the system

Ø Mobile phone is moving around. We may need location and text

awareness

❚  Completely hiding failures of networks and nodes is

(theoretically and practically) impossible

Ø You cannot distinguish a slow computer from a failing one

Ø You can never be sure that a server actually performed an operation

before a crash

❚  Users may be located in different continents à distribution is

apparent and not something you want to hide

Observation: Aiming at full distribution

transparency may be too much

(28)

28

Openness of Distributed Systems

Offer services according to

standard rules

that

describe the syntax and semantics of those services"

How to achieve openness"

Ø Well-defined interfaces (often described using IDL)"

ü Easy to define syntax. But semantics are hard thus it is defined in a natural language"

Ø portability !!

ü The same implementation should work on different machines"

Ø Easily interoperate !

ü Two different implementations should work together"

Distributed system should be independent from

heterogeneity of the underlying environment

Hardware, Software Platforms, and Languages "

(29)

29

Scalability in Distributed Systems

Three aspects of scalability

Ø size: number of users and/or processes

Ø geographical: Maximum distance between nodes

Ø administrative : Number of administrative domains

Most systems account only, to a certain extent, for

size scalability: powerful servers (supercomputer)

Challenge nowadays: geographical and

administrative scalability

(30)

Problems with Size Scalability

n

What happens when more users/resources added?

n

Limitations of centralized systems

l Service (e.g., single server) overloaded servers

l Data (e.g., single phone book) saturated communication links

l Algorithm (e.g., routing based on global info) too much traffic

n

Use distributed service, database, and algorithm

l No machine has complete info

l Make decision based on local info

l Failure of one node does affect others l No global clock

(31)

Problems with Geographical Scalability

Suppose we have an interactive application working

on a LAN, can we use it over a WAN?"

Delay "

Ø Blocking read/write might be OK on LAN but not on WAN"

Reliability"

Ø Longer the distance higher the chance of loosing messages"

Bandwidth "

(32)

Problems with Administrative Scalability

n

In a single domain:

l We can try to optimize resource usage because each entity

belongs to the same domain and can be trusted

n

In case of multiple and independent administrative

domains:

l We do not own all resources and cannot trust others l Several problems

! Conflicting policies (who uses what and pays how much)

! Management

(33)

Techniques for Scalability

Hiding communication latency: (Geographical)"

Ø Use asynchronous communication: "

ü +: separate handler for incoming response and do something while waiting."

ü - what if there is nothing else to do "

Distribution: splitting it to small parts"

Ø Domain naming systems (DNS)"

Ø Decentralized data, information systems (WWW)"

Ø Decentralized algorithm (Distance Vector)"

Replicate: "

Ø Increase availability "

(34)

Techniques for Scalability (cont’d)

❚  Use Replication/caching that makes multiple copies of the same services or data available at different machines"

Ø Mirrored Web sites"

Ø Replicated file servers and databases"

Ø Web caches (in browsers and proxies)"

Ø File caching (at server and client)"

Ø + increase availability"

Ø + improve load balance and performance"

Ø + hide communication latency"

"

Ø  - Inconsistencies when one copy is modified "

Ø  - Global synchronization is need for keeping copies consistent but it

precludes large-scale solutions"

(35)

Techniques for Scalability (cont’d)

All the techniques discussed so far deal with

performance problems due to size and geographical

scalability "

How about administrative scalability?"

Ø The most challenging one (why?)"

(36)

Developing Distributed Systems: Pitfalls

Mistakes are often due to false assumptions:"

Ø The same global time!

Ø Perfect network/communication"

ü Latency is zero""

ü Bandwidth is infinite"

ü The network is reliable"

ü The network is secure"

ü The network is homogeneous"

Ø The topology does not change"

(37)

37

Outline

Different Distributed Systems

Ø Distributed computing systems Ø Distributed information systems Ø Distributed pervasive systems

OS in distributed systems

Ø Distributed OS vs. Network OS vs. Middleware

Design objectives of distributed systems

Ø Transparency, openness and scalability

Architecture of distributed systems

Ø Software vs. system architectures

(38)

38

Software Architectures for DS

Logical organization

of different

components

;

Ø How to divide to different components

Ø How to connect and communicate with each other Ø How to figure different elements into a system

division of responsibilities

of components;

distribute them over multiple machines, and allow

them to communicate through

connectors

Ø Component: a modular unit with well-defined required and

provided interfaces,"

Ø Connector: a mechanism that mediates communication, coordination, and cooperation (e.g., RPC, msg passing)

(39)

Software Architecture Style

Layered style "

Ø A component at higher layers can call components at a lower layer"

Ø Widely adopted by the networking community"

Object-based "

Ø Components are connected through a procedure call"

Ø Used for client-server systems"

(40)

Software Architecture Style

Event-based(Publish/subscribe) "

Ø Communicate through propagation of events"

Ø Loosely coupled components"

ü decoupled in space or referentially decoupled"

Data-centered: "

Ø Communicate through a common repository

(e.g., shared distributed file system) "

Ø Can combine with event-based, yielding shared data spaces

"

ü processes are now decoupled in space and

time"

processes do not refer to each other

processes do not need to be active at the same time

(41)

41

System Architectures for DS

Consider how and where to place software

components and realize their

interactions


!

About physical realization, placement of software

components &

interactions

Ø Centralized: client-server

Ø Decentralized: P2P (Structured vs. unstructured) Ø Hybrid: Combination of centralized and P2P

(42)

42

Basic Client-Server Model

Clients & servers may across different machines

Clients follow request/reply model to use services

CS5523: Operating System @ UTSA

+: efficient

-: reliability. Can’t detect whether a request is lost or a reply fail. Connection-oriented protocol: TCP/IP

(43)

Application Layering (logical)

User-interface layer"

Processing layer: "

Ø functions of an application, i.e. without specific data"

Data layer: "

Ø data that a client wants to manipulate"

How to draw a clear line between client end server?!

Observation: layering is found in many distributed information systems, using

(44)

44

Traditional Two-Tiered Configurations

CS5523: Operating System @ UTSA

How to place the three layers on client and server?!

(45)

45

Multitiered Architectures

CS5523: Operating System @ UTSA

The server part could be distributed over multiple

machines,"

Three-tiered

: each layer on a separate machine"

Vertical

(46)

46

Servers and States: Stateful Servers

Vertical Distribution:

Ø Functions are logically and physically split across multiple

machines.

Ø Processes are not equal and Interactions are asymmetric: One acts as client while the other acts as server"

Keeps track of the status of its clients

Ø Record that a file has been opened, so that pre-fetching

can be done

Ø Knows which data a client has cached, and allows clients

to keep local copies of shared data

(47)

Decentralized Architectures: P2P Systems

❚  P2P architectures are horizontal distribution:"

Ø split up clients and servers into logically equivalent parts and let each part operate on its own share of data"

❚  Processes are equal and Interactions are symmetric"

Ø Each acts as both client and server"

❚  Tremendous growth in the last couple of years"

(48)

48

Decentralized Architectures: An Example

Application! Application! Application! Peer 1! Peer 2! Peer 3! Peers 5 .... N! Sharable! objects! Application! Peer 4! Star War Roman Holiday The Beatles

(49)

49

Decentralized Architectures: P2P Systems

Key question:"

Ø How to organize processes in an overlay network, where links are usually TCP channels"

Three approaches to organize nodes into overlay

networks through which data is routed"

Ø Structured P2P: nodes are organized following a specific

distributed data structure and deterministic algorithms"

Ø Unstructured P2P: randomly selected neighbors"

Ø Hybrid P2P: some nodes are appointed special functions

in a well-organized fashion"

(50)

Structured P2P Systems

Distributed Hash Table (DHT) is the most used one"

Ø Assume we have a large ID space Ω (e.g., 128-bit)"

Ø Assign random keys to data items (from Ω)"

Ø Assign random number to nodes (from Ω)"

Ø The key of DHT: implement an efficient and deterministic scheme that maps the key of a data item to node ID"

Ø When looking up a data item, the system should route the request to the associated node and return the network

address of that node"

(51)

51

A DHT Example: Chord

Data item with key k is

mapped to a node with the

smallest ID >= k."

This node is called as the

successor of key k and

denoted by succ(k)!

CS5523: Operating System @ UTSA

LOOKUP(key=8) ?

(52)

52

Unstructured P2P Architectures

Basic principle: Each node maintains a random list

of neighbors:

Ø Each peer maintain a partial view of the network,

consisting of c other nodes

Ø Each node P periodically selects a node Q from its partial

view: P and Q exchange information and exchange members from their respective partial views

Ø Robustness of the network can be maintained depending

on the random exchange

(53)

53

Superpeers

Peers maintain an index (for search)

Peers monitor the state of the network

Peers setup connections

(54)

54

Hybrid Structures: Edge Servers

Edge-server architectures, which are often used for

Content Delivery Networks

(55)

55

Collaborative Hybrid Structures: BitTorrent

Combining a P2P with a client-server architecture

Basic idea: a node identifies where to download a

file from and joins a swarm of downloaders; who get

file chunks in parallel from the source, and distribute

these chunks among each other

(56)

56

Summary

Distributed Systems

Ø Distributed computing systems Ø Distributed information systems Ø Distributed pervasive systems

OS in distributed systems

Ø Distributed OS vs. Network OS vs. Middleware

Design objectives of distributed systems

Ø Transparency, openness and scalability

Architecture of distributed systems

Ø Software vs. system architectures

References

Related documents