CS 5523 Operating Systems:
Intro to Distributed Systems
Instructor: Dr. Tongping Liu
Thank Dr. Dakai Zhu, Dr. Palden Lama for providing their slides.
2
Outline
❚
Different Distributed Systems
Ø Distributed computing systems Ø Distributed information systems Ø Distributed pervasive systems
❚
OS in distributed systems
Ø Distributed OS vs. Network OS vs. Middleware
❚
Design objectives of distributed systems
Ø Transparency, openness and scalability
❚
Architecture of distributed systems
Ø Software vs. system architectures
Computer System Revolution
❚
Computers"
"
"
large/expensive
à
small/cheap"
❚
Networks: "
"LAN à WAN, " " "bps à Kbps à Gbps"❚
Now, it is easy to put together many computers. Why? "
Ø Solve problems"
Ø Share resources"
Ø Increase collaboration"
4 intranet ISP desktop computer: backbone satellite link server: ☎ network link: ☎ ☎ ☎
What are
Distributed
Systems?
CS5523: Operating System @ UTSA
A collection of
networked
independent computers
that appears to its users as a
5
Different Types of Distributed Systems
❚
Distributed Computing Systems: HP computing task
Ø Cluster computing: similar components
Ø Grid computing / Cloud computing: different components
❚
Distributed Information Systems
Ø Web servers
Ø Distributed database applications
❚
Distributed Pervasive Systems: instable
Ø Smart home systems
Ø Electronic health systems: monitor
Ø Sensor networks: surveillance systems
6
Cluster Computing Systems
❚
High-performance computing
Ø a group of high-end systems connected through a LAN Ø Homogeneous: same OS, near-identical hardware
Ø Single managing node
7
Grid Computing Systems
❚
Lots of nodes from everywhere
share resources and collaborate"Ø Heterogeneous!
Ø Dispersed across several organizations"
Ø Can easily span a wide-area network"
❚
To allow for collaborations, grids generally use
virtual organizations. "
Ø Same organization: a grouping of users (or better: their IDs) have the same access rights"
Ø The key questions are "
ü Authorize users from different administrative domains"
ü Provide authorized users with the access to specific resources "
8
Grid Computing Systems (cont.)
❚ Application: "
Ø Use the grid computing environment"
❚ Collective layer: "
Ø Handles access to multiple resources (resource discovery, allocation and scheduling, data replication)"
❚ Connectivity layer: "
Ø Communication protocols "
(transfer data across resources, access a remote resource, security)""
❚ Resource layer: "
Ø Manage a single resource "
(create a process, access control)"
❚ Fabric layer: "
Ø Interface to local resources (query, locking)"
Cloud Computing Systems
❚ “Cloud computing has become another buzzword after Web 2.0.”"
❚ “We won’t compute on local computers, but on centralized facilities operated by third-party compute and storage utilities”""
❚ “There are dozens of different definitions for cloud computing and there seems to be no consensus on what a cloud is.” Here is one definition:! !“A large-scale distributed computing paradigm that is driven by
economies of scale, in which a pool of abstracted, virtualized,
dynamically-scalable, managed computing power, storage,
platforms, and services are delivered on demand to external customers
over the Internet.”"
❚ “Cloud computing is not a completely new concept; it has intricate
connection to the relatively new but thirteen-year established grid computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general.” "
CS5523: Operating System @ UTSA 9
I. Foster, Cloud Computing and Grid Computing 360-Degree Compared, Grid Computing Environments Workshop, 2008. GCE '08
DIS: Distributed Information Systems
❚
Organizations have legacy networked applications,
but it is hard to make them interoperate "
❚
Middleware can help "
❚
Integration can take place at several levels"
Ø Client-servers wrap a number of requests into one and have it executed as a Distributed Transaction (all or none of requests would be executed)"
Ø Applications can be detached from their databases and may need to directly communicate with each other:
DIS: Transaction Processing Systems
Database applications, transactions properties (ACID) "
❚
Atomic: A transaction happens indivisibly; "
Ø All operations either succeed, or all of them fail; "
❚
Consistent: Don’t violate system invariants "
Ø Internal transfer: total money should be the same"
Ø Intermediate states may violate if not visible outside"
❚
Isolated (serializable): Concurrent transactions do
not interfere with each other"
❚
Durable: Once a transaction commits, the changes
are permanent "
DIS: Distributed Database Transactions
nTransaction
Processing
Monitor: coordinate
the execution of a
transaction
(sub-transactions) when
data is distributed
across servers
middleware13
DIS: Enterprise Information Systems
❚
Inter-application Communication"
Ø RPC (Remote Procedure Calls) or" RMI (Remote Method Invocations)" ü both applications must be up and running"
ü know exactly how to refer to each other"
Ø Message-Oriented Middleware (MOM)"
ü send data to a logical contact"
ü publish/subscribe"
CS5523: Operating System @ UTSA
14
DPS: Distributed Pervasive Systems
❚
DCS and DIS: stable distributed systems (fixed
nodes good connections)"
❚
Unstable with mobile and embedded devices "
❚
Distributed Pervasive Systems:"
Ø Computing anywhere and anytime!
Ø Contextual change: environment changes should be
immediately react."
Ø Ad hoc composition: Each node may be used in a very
different ways by different users. Requires ease-of-configuration."
Ø Sharing is the default: Nodes come and go, providing
sharable services and information. Calls again for simplicity."
CS5523: Operating System @ UTSA
15
DPS: Electronic Health Care Systems
❚
Devices are physically close to a person
Ø Where and how should monitored data be stored? Ø How can we prevent loss of crucial data?
Ø How can security be enforced?
Ø How can physicians provide online feedback?
16
DPS: Sensor Networks
CS5523: Operating System @ UTSA
The nodes to which sensors are attached are:
n Many (10s-1000s)
n Simple (small memory/compute/communication capacity)
Distributed Systems: Definition
❚
A distributed system (DS) is a piece of software that
ensures that "
a collection of independent computers, communicate through network by passing messages!
And appears to its users as a single coherent system" "
"
❚
But HOW can we"
Ø hide the differences between independent computers &"
Ø provide a single system view?" "
18
OS Structures in Distributed Systems
❚
Distributed Operating Systems: OS essentially
tries to maintain a single, global view of the
resources it manages (Tightly-coupled OS).
❚
Network Operating Systems: Collection of
independent OSes augmented by network services
(Loosely-coupled OS)
❚
Middleware: Provide a level of transparency
between applications and local OSes
19
Distributed Operating Systems
❚
Full transparency: users feel a big system and are
not aware of multiple different machines
❚
Access to remote services similar to local resources
20
Distributed Operating Systems (Cont.)
❚
Each node has its own kernel for managing local
resources (memory, local CPU, disk, …)
❚
Common software layer implements OS and supports
parallel and concurrent execution of various tasks.
❚
Possible complete software implementation of
shared memory
(distributed shared memory)
❚
Additional facilities:
❚ task assignments, handle hardware failures, transparent
storage, inter-process communication, data/computation/ process migration.
21
Network Operating Systems (NOS)
❚
NO single view of the distributed system
❚
Users are aware of the multiplicity of the machines
❚
Applications use
NOS services
to access resources
22
Network Operating Systems (Cont.)
❚ NOS provide facilities to allow users to make use of services
in other machines
Ø Remote login (telnet, rlogin)
Ø File transfer (ftp)
❚ The only communication is message passing
❚ Users need to explicitly log on into remote machines, or
copy files from one machine to another.
❚ Need multiple passwords, multiple access permissions.
❚ In contrast, adding or removing a machine is relatively simple.
23
Middleware-Based Systems
❚
Middleware: a higher level of abstraction on top of
network operating systems for efficient transparency
24
Middleware-Based Systems (cont.)
❚
Each local system is a part of underlying NOS
❚
Target of
middleware
:
Ø
Resolve the integration problems of various
networked applications
❚
Examples: Remote Procedure Calls (RPC),
Remote Method Invocations (RMI),
Distributed File Systems, Distributed Object
Systems
25
Design Objectives of Distributed Systems
❚
Goal of distributed systems
Ø Make resources (hardware/software) available
Ø Make it easy for users to access remote resources
Ø Share resources in a controlled and efficient way
❚
Distribution transparency: hiding distribution
❚
Openness
❚
Scalability
26
Distribution Transparency
Access Hides differences in data representation and
invocation mechanisms
Location Hides where a resource resides
Migration Hides that a resource may move
Relocation Hides that a resource may be moved while in
use
Replication Hides that a resource is replicated
Concurrency Hides that other users may access the same
resource
Failure Hides failure and possible recovery of resources
27
Degree of Transparency
❚ Full transparency will can be costly
❚ Expose distribution of the system
Ø Mobile phone is moving around. We may need location and text
awareness
❚ Completely hiding failures of networks and nodes is
(theoretically and practically) impossible
Ø You cannot distinguish a slow computer from a failing one
Ø You can never be sure that a server actually performed an operation
before a crash
❚ Users may be located in different continents à distribution is
apparent and not something you want to hide
Observation: Aiming at full distribution
transparency may be too much
28
Openness of Distributed Systems
❚
Offer services according to
standard rules
that
describe the syntax and semantics of those services"
❚
How to achieve openness"
Ø Well-defined interfaces (often described using IDL)"
ü Easy to define syntax. But semantics are hard thus it is defined in a natural language"
Ø portability !!
ü The same implementation should work on different machines"
Ø Easily interoperate !
ü Two different implementations should work together"
❚
Distributed system should be independent from
heterogeneity of the underlying environment
Hardware, Software Platforms, and Languages "
29
Scalability in Distributed Systems
❚
Three aspects of scalability
Ø size: number of users and/or processes
Ø geographical: Maximum distance between nodes
Ø administrative : Number of administrative domains
❚
Most systems account only, to a certain extent, for
size scalability: powerful servers (supercomputer)
❚
Challenge nowadays: geographical and
administrative scalability
Problems with Size Scalability
n
What happens when more users/resources added?
n
Limitations of centralized systems
l Service (e.g., single server) overloaded servers
l Data (e.g., single phone book) saturated communication links
l Algorithm (e.g., routing based on global info) too much traffic
n
Use distributed service, database, and algorithm
l No machine has complete info
l Make decision based on local info
l Failure of one node does affect others l No global clock
Problems with Geographical Scalability
❚
Suppose we have an interactive application working
on a LAN, can we use it over a WAN?"
❚
Delay "
Ø Blocking read/write might be OK on LAN but not on WAN"
❚
Reliability"
Ø Longer the distance higher the chance of loosing messages"
❚
Bandwidth "
Problems with Administrative Scalability
n
In a single domain:
l We can try to optimize resource usage because each entity
belongs to the same domain and can be trusted
n
In case of multiple and independent administrative
domains:
l We do not own all resources and cannot trust others l Several problems
! Conflicting policies (who uses what and pays how much)
! Management
Techniques for Scalability
❚
Hiding communication latency: (Geographical)"
Ø Use asynchronous communication: "
ü +: separate handler for incoming response and do something while waiting."
ü - what if there is nothing else to do "
❚
Distribution: splitting it to small parts"
Ø Domain naming systems (DNS)"
Ø Decentralized data, information systems (WWW)"
Ø Decentralized algorithm (Distance Vector)"
❚
Replicate: "
Ø Increase availability "
Techniques for Scalability (cont’d)
❚ Use Replication/caching that makes multiple copies of the same services or data available at different machines"
Ø Mirrored Web sites"
Ø Replicated file servers and databases"
Ø Web caches (in browsers and proxies)"
Ø File caching (at server and client)"
Ø + increase availability"
Ø + improve load balance and performance"
Ø + hide communication latency"
"
Ø - Inconsistencies when one copy is modified "
Ø - Global synchronization is need for keeping copies consistent but it
precludes large-scale solutions"
Techniques for Scalability (cont’d)
❚
All the techniques discussed so far deal with
performance problems due to size and geographical
scalability "
❚
How about administrative scalability?"
Ø The most challenging one (why?)"
Developing Distributed Systems: Pitfalls
❚
Mistakes are often due to false assumptions:"
Ø The same global time!
Ø Perfect network/communication"
ü Latency is zero""
ü Bandwidth is infinite"
ü The network is reliable"
ü The network is secure"
ü The network is homogeneous"
Ø The topology does not change"
37
Outline
❚
Different Distributed Systems
Ø Distributed computing systems Ø Distributed information systems Ø Distributed pervasive systems
❚
OS in distributed systems
Ø Distributed OS vs. Network OS vs. Middleware
❚
Design objectives of distributed systems
Ø Transparency, openness and scalability
❚
Architecture of distributed systems
Ø Software vs. system architectures
38
Software Architectures for DS
❚
Logical organization
of different
components
;
Ø How to divide to different components
Ø How to connect and communicate with each other Ø How to figure different elements into a system
❚
division of responsibilities
of components;
distribute them over multiple machines, and allow
them to communicate through
connectors
Ø Component: a modular unit with well-defined required and
provided interfaces,"
Ø Connector: a mechanism that mediates communication, coordination, and cooperation (e.g., RPC, msg passing)
Software Architecture Style
❚
Layered style "
Ø A component at higher layers can call components at a lower layer"
Ø Widely adopted by the networking community"
❚
Object-based "
Ø Components are connected through a procedure call"
Ø Used for client-server systems"
Software Architecture Style
❚
Event-based(Publish/subscribe) "
Ø Communicate through propagation of events"
Ø Loosely coupled components"
ü decoupled in space or referentially decoupled"
❚
Data-centered: "
Ø Communicate through a common repository
(e.g., shared distributed file system) "
Ø Can combine with event-based, yielding shared data spaces
"
ü processes are now decoupled in space and
time"
processes do not refer to each other
processes do not need to be active at the same time
41
System Architectures for DS
❚
Consider how and where to place software
components and realize their
interactions
!
❚
About physical realization, placement of software
components &
interactions
Ø Centralized: client-server
Ø Decentralized: P2P (Structured vs. unstructured) Ø Hybrid: Combination of centralized and P2P
42
Basic Client-Server Model
❚
Clients & servers may across different machines
❚
Clients follow request/reply model to use services
CS5523: Operating System @ UTSA
+: efficient
-: reliability. Can’t detect whether a request is lost or a reply fail. Connection-oriented protocol: TCP/IP
Application Layering (logical)
❚
User-interface layer"
❚
Processing layer: "
Ø functions of an application, i.e. without specific data"
❚
Data layer: "
Ø data that a client wants to manipulate"
How to draw a clear line between client end server?!
Observation: layering is found in many distributed information systems, using
44
Traditional Two-Tiered Configurations
CS5523: Operating System @ UTSA
How to place the three layers on client and server?!
45
Multitiered Architectures
CS5523: Operating System @ UTSA
❚
The server part could be distributed over multiple
machines,"
❚
Three-tiered
: each layer on a separate machine"
Vertical
46
Servers and States: Stateful Servers
❚
Vertical Distribution:
Ø Functions are logically and physically split across multiple
machines.
Ø Processes are not equal and Interactions are asymmetric: One acts as client while the other acts as server"
❚
Keeps track of the status of its clients
Ø Record that a file has been opened, so that pre-fetching
can be done
Ø Knows which data a client has cached, and allows clients
to keep local copies of shared data
Decentralized Architectures: P2P Systems
❚ P2P architectures are horizontal distribution:"
Ø split up clients and servers into logically equivalent parts and let each part operate on its own share of data"
❚ Processes are equal and Interactions are symmetric"
Ø Each acts as both client and server"
❚ Tremendous growth in the last couple of years"
48
Decentralized Architectures: An Example
Application! Application! Application! Peer 1! Peer 2! Peer 3! Peers 5 .... N! Sharable! objects! Application! Peer 4! Star War Roman Holiday The Beatles
49
Decentralized Architectures: P2P Systems
❚
Key question:"
Ø How to organize processes in an overlay network, where links are usually TCP channels"
❚
Three approaches to organize nodes into overlay
networks through which data is routed"
Ø Structured P2P: nodes are organized following a specific
distributed data structure and deterministic algorithms"
Ø Unstructured P2P: randomly selected neighbors"
Ø Hybrid P2P: some nodes are appointed special functions
in a well-organized fashion"
Structured P2P Systems
❚
Distributed Hash Table (DHT) is the most used one"
Ø Assume we have a large ID space Ω (e.g., 128-bit)"
Ø Assign random keys to data items (from Ω)"
Ø Assign random number to nodes (from Ω)"
Ø The key of DHT: implement an efficient and deterministic scheme that maps the key of a data item to node ID"
Ø When looking up a data item, the system should route the request to the associated node and return the network
address of that node"
51
A DHT Example: Chord
❚
Data item with key k is
mapped to a node with the
smallest ID >= k."
❚
This node is called as the
successor of key k and
denoted by succ(k)!
CS5523: Operating System @ UTSA
LOOKUP(key=8) ?
52
Unstructured P2P Architectures
❚
Basic principle: Each node maintains a random list
of neighbors:
Ø Each peer maintain a partial view of the network,
consisting of c other nodes
Ø Each node P periodically selects a node Q from its partial
view: P and Q exchange information and exchange members from their respective partial views
Ø Robustness of the network can be maintained depending
on the random exchange
53
Superpeers
❚
Peers maintain an index (for search)
❚
Peers monitor the state of the network
❚
Peers setup connections
54
Hybrid Structures: Edge Servers
❚
Edge-server architectures, which are often used for
Content Delivery Networks
55
Collaborative Hybrid Structures: BitTorrent
❚
Combining a P2P with a client-server architecture
❚
Basic idea: a node identifies where to download a
file from and joins a swarm of downloaders; who get
file chunks in parallel from the source, and distribute
these chunks among each other
56
Summary
❚
Distributed Systems
Ø Distributed computing systems Ø Distributed information systems Ø Distributed pervasive systems
❚
OS in distributed systems
Ø Distributed OS vs. Network OS vs. Middleware
❚
Design objectives of distributed systems
Ø Transparency, openness and scalability
❚
Architecture of distributed systems
Ø Software vs. system architectures