Distributed Computing
Processes and Threads
Processes
:
– The concept of a Process originates from the field of
operating systems where it generally defined as a
“Program in execution”
– From an OS perspective, key issues are:
Process management Process scheduling
Threads:
– In traditional OS, each process has an Address space and
a single thread of control
Process vs. Thread
A process is a collection of address space, code, data, and
system resources.
A thread is code that is to be serially executed within a
process.
A processor executes threads, not processes, so each
application has at least one process, and a process always
has at least one thread of execution, known as the primary
thread.
Thread
– is a sequence of executable code within process.
Major differences between threads and processes
1. Threads (Light weight Processes) share the address space of
the process that created it; processes have their own address.
2. Threads have direct access to the data segment of its
process; processes have their own copy of the data segment of
the parent process.
3. Threads can directly communicate with other threads of its
process; processes must use inter process communication to
communicate with sibling processes.
Major differences between threads and processes
5. New threads are easily created; new processes require duplication of the parent process.
6. Threads can exercise control over threads of the same process; processes can only exercise control over child processes.
7. Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process do not affect child processes.
Advantages of multithreading
1.
Responsiveness
2.
Deadlock avoidance
3.
Utilization of multiprocessor architecture
4.
Economy
Threads share an address space, files, code, and data
Avoid resource consumption
Benefit 1: Responsiveness
Clients
Web Server
DB1
http pages
Benefit 2: Deadlock avoidance
Bounded buffer
Bounded buffer
send
receive
send
receive
A sequence of send and then receive does not work!
Benefit 3: Utilization of multiprocessor architecture
Place code, files and data in the main memory, Distribute threads to each of CPUs, and Let them execute in parallel.
Approaches of implementing threads
•
A thread package contains operations to
– create and destroy threads, as well as operations on synchronization variables, e.g. mutexes and condition variables
•
Two Approaches
:
• A user-level thread library has the advantage of being cheap to create and destroy threads, to switch thread context. A major disadvantage is that invocation of a blocking system call will block the entire process to which the thread belongs
Thread Implementation
Combining kernel-level lightweight processes (threads) and user-level threads.
• The thread package can be shared by multiple LWPs.
Advantages of Combination
1. Creating, destroying and synchronizing threads is relatively cheap and involves no kernel intervention at all.
2. Provided that a process has enough LWPs, a blocking system call will not suspend the entire process.
3. There is no need for an application to know about LWPs. All it sees are user-level threads.
Threads in DS - Multithreaded clients
• Threads have the property of allowing blocking system calls without blocking the entire process. This makes it much easier to maintain multiple logical connections at the same time.
• Developing a web browser as a multithreaded client makes it easy to support the browser’s multiple concurrent activities
– e.g., local display and multiple simultaneous connections to the server ( each thread sets up a separate connection to the server and pulls in the data).
Threads in DS -
Multithreaded Servers (1)
Multithreading not only simplifies server code but also makes it much easier to explore parallelism to attain high performance even on uniprocessor systems.
Multithreaded Servers (2)
Three ways to construct a server
.
Model Characteristics
Threads Parallelism, blocking system calls
Single-threaded process No parallelism, blocking system calls
Finite-state machine Parallelism, nonblocking system calls
• In the case of Single- threaded file server, the main loop of the file server gets a request, examines it, and carries it out to completion before getting the next one.
Multithreaded Servers (3)
Threads make it possible to retain the idea of sequential processes that make blocking system calls and still achieve parallelism. Blocking system calls make programming easier and parallelism improves performance.
The single-threaded server retains the ease and simplicity of blocking system calls, but gives up some amount of performance.
The Role of Virtualization in Distributed Systems
The Role of Virtualization in Distributed Systems
In practice, every (distributed) computer system offers a
programming interface to higher level software (fig (a)).
In its essence, virtualization deals with extending or replacing
an existing interface so as to mimic the behavior of another
systems (fig (b)).
Why virtualization?
Architectures of Virtual Machines (1)
Computer systems generally offer four types of Interfaces, at
four different levels:
– An interface between the hardware and software consisting of
machine instructions
that can be invoked by any program.
– An interface between the hardware and software, consisting of machine instructions
that can be invoked only by privileged programs, such as an operating system.
– An interface consisting of system calls as offered by an operating system.
– An interface consisting of library calls
generally forming what is known as an application programming interface (API).
Architectures of Virtual Machines (2)
Various interfaces offered by computer systems.
Architectures of Virtual Machines (3)
Architectures of Virtual Machines (4)
Virtualization can take place in two different ways:
– First, we can build a runtime system that essentially provides an abstract instruction set that is to be used for executing applications.
– Instructions can be interpreted (e.g. JRE) or can be emulated (e.g. running windows applications on UNIX platforms)
Architectures of Virtual Machines (5)
Architectures of Virtual Machines (5)
– An alternative approach is to provide a system that is essentially implemented as a layer completely shielding the original hardware, but offering the complete instruction set of that same (or other hardware) as an interface.
– As a result it is possible to have multiple, and different OSs run independently and concurrently on the same platform.
Degrees of Mobility
Technique Data Control Code Data
State Execution State Navigational Autonomy Direction Transfer
Message Passing Move In/Out
RPC Move Move Out
Remote Execution Move Move Move Out
Code on Demand Move Move In
Process Migration Move Move Move Move Move In/Out
Mobile Agents (weak) Move Move Move Move Own In/Out
Mobile Agents (strong) Move Move Move Move Move Own In/Out
So far, we have been mainly concerned with distributed systems in which
communication is limited to passing data.
System Examples
Types Systems
Message Passing Socket, PVM, MPI
RPC Xerox Courier, SunRPC, RMI
Remote Execution Servlets, Remote evaluation, Tacoma
Code on Demand Applets, VB/Jscripts
Process Migration Condor, Sprite, Olden
Mobile Agents (Weak Migration) IBM Aglets, Voyager, Mole
Remote Execution
Procedure code is sent together with arguments.
Server behaves like a general cycle server.
Server can evolve itself.
Main Program
Function Object
Client Server
Dispatcher Arguments
Function Object
f( )
Function/object transfer
Argument transfer Remote
execution
Return value
Code on Demand
Server behaves like a general remote object server.
A remote function/object is sent back as a return value.
Client executes the function/object locally.
Client execution control stays in local while suspended upon a request to a server.
Main Program
func( )
Client Server
Dispatcher
Remote Function
Object
Request a remote function/object
Process Migration
Selecting a process to be migrated
Selecting the destination node
Suspending the process
Capturing the process state
Sending the state to the destination
Resuming the process
Forwarding future messages to the destination Process P1 : : : : Execution suspended Source Site Destination Site Execution Resumed : : : : Process P1 Transfer of control Time Freezing time
• Process migration is the act of transferring a process between two
Motivation - Process Migration(Code Migration)
(1) Performance Improvement
– Move code from heavily-loaded to lightly-loaded machine (load balancing)
– Move code closer to data (e.g. Database applications)
Client code migration
Server code migration
– Mobile agents (mobile program)-by exploiting parallelism
(2) Improving system reliability
– Migrating processes from a site in failure to more reliable sites – Replicating and migrating critical processes to a remote.
(3) Flexibility
– Traditional approach- multitiered client-server application
Reasons for Migrating Code
Models for Code Migration
Process Structure:
• Code segment: contains the actual code ( set of instructions)
• Resource segment: contains references to external resources(e.g. files, printers)
• Execution segment: contains current execution state of a process(e.g. stack, PC)
Weak mobility: Move only code segment and start execution from the beginning after migration:
• Relatively simple, especially if code is portable (target machine can execute the code)
• Distinguish code shipping (push) from code fetching (pull)
• e.g., Java applets
Strong mobility: Move Code as well as Execution segment
Models for Code Migration
Migration and Local Resources
(resource segment)
•Problem: An object uses local resources that may or may not be available at the target site.
•Resource types:
• Fixed: the resource cannot be migrated (e.g. local devices, sockets) • Fastened: the resource can be migrated but only at high cost
(e.g. local DB, websites)
• Unattached: the resource can easily be moved (e.g. data files)
•Process-to-resource binding:
• By identifier: the object requires a specific instance of a resource (e.g., URL, IP address, socket)
• By value (weaker form of binding): the object requires the value of a resource (e.g., standard language libraries-Java) • By type(weakest form): the object requires that only a type of resource
Migration and Local Resources
Resource Migration Techniques:
(1) Global reference:
set up a global name
Can be accessed remotely
Might be expensive due to communication cost
can set up a proxy
(2) Move/Copy the reference: (e.g. copy small data files)
Migration in Heterogeneous Systems
The principle of maintaining a migration stack to support migration of an
execution segment in a heterogeneous systems
•Migrate only at certain points in the program (e.g., before / after
Migration of Memory(Address Space)
Three ways to handle migration (which can be combined):
•
Pre-copy:
Process continues to execute on the source machine whilethe address space is copied. Pushing memory pages to the new machine and resending the ones that are later modified during the migration process.
•
Complete-copy:
Transfer entire address space. Stopping thecurrent virtual machine; migrate memory, and start the new virtual machine.
Secure Code Migration
Approaches:
1)
Sandboxing:
run the downloaded code in a controlled/isolated environment
2)
Playground:
Separate machine exclusively for running the migrated code
3)
Code-Signing: