Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
1991
An Experiment in the complexity of load balancing
algorithms
Charles Carlino
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
Recommended Citation
An Experiment in the Complexity of Load Balancing Algorithms
by
Charles Carlino
A Thesis
Submitted to the Graduate Computer Science Department,
School of Computer Science and Technology
Rochester Institute of Technology
1 January 1991
Approved by:
(Dr. Peter Lutz,
Commi~
Chamnan)
(Dr.
Andrew Kitchen, Reading M'embeb
Contents
1
January
1991
TABLE OF
CONTENTS
1. Introduction
1
1.1.
Problem
Statement
3
1.2. Previous Work
4
2. Theoretical
Development
15
2.1.
The
Load
Balancing
Algorithms
15
2.1.1.
Random
15
2.1.2.
Threshold
15
2.1.3.
Shortest
15
2.1.4.
Analytical Models
16
2.2.
Criteria
for
Using
Load
Balancing
16
2.2.1.
The
FIFO
Server
Model
17
2.2.2.
The Round-Robin Server
Model-
Late Arrivals
19
2.2.3.
The Round-Robin Server Model
-Early
Arrivals
20
2.3. Fairness
of
Comparison
21
2.4.
Performance Measures
&
Tests
21
3.
The
Software
Project
22
3.1. Functional
Specification
22
3.1.1.
Information
Distribution
22
3.1.2.
Load
Balancing
Decisions
22
3.1.3.
Job
Movement
22
3.2.
Functional
Analysis
22
3.2.1.
The
Info function
24
3.2.2.
The Receive Job function
25
3.2.3.
The Load Balance function
25
3.2.4.
The Run Job function
27
3.2.5.
The Send Job function
28
3.2.6.
Data Definitions
28
3.3.
Architecture
31
3.3.1.
Processes
31
3.3.2.
The job
transfer
protocol32
3.4.
Verification
&
Validation
33
3.4.1.
Procedures
33
3.4.2.
Test Runs
33
3.4.3.
Results
34
3.5. Utilities
35
3.5.1.
Data
Reduction
36
Contents
1
January
1991
4.
Experiments
38
4.1.
Parameters
38
4.1.1.
Transfer
Costs
38
4.1.2.
Minimum
Response
Time
42
4.1.3.
Data
Recording
Costs
43
4.1.4.
Probing
Costs
43
4.2.
Test Runs
45
4.2.1.
Procedure
45
4.2.2.
Test
Observations
45
5. Results
47
5.1.
Complexity
47
5.2.
Startup
Criteria
52
6.
Conclusions
53
6.1.
Complexity
53
6.2.
Startup
Criteria
53
References
54
Appendix
A.
Idle Resource Calculation
A-l
Appendix
B.
"Adaptive Load
Sharing
in
Homogeneous
Distributed
Systems"
B-l
Appendix
C.
"Load
Balancing
in
Homogeneous Broadcast
Distributed
Systems"C-l
Appendix
D.
Stability
Reports
D-l
-Introduction
1
January
1991
1. Introduction
During
the
operation of
anetwork
ofcomputers,
it is
likely
that
someof
the
processors willhave
a
heavy
processing load
while others
willbe
lightly
loaded
or
idle. When
this
happens,
potentially
usefulresources
(processors)
are
being
wastedand
the
responsetime
ofthe
overallsystem
is
being
degraded. The
process of
reducing
this
waste and
improving
responsetime
by
redistributing
computing
load has
come
to
be known
as
processorload balancing.
Load
balancing
has
also
been
applied
to
other resources
suchas
secondary
storage,
but
this
paperdeals only
withthe
cpu.
Processor load
balancing
canbe
moreprecisely defined
as
any
procedure which attemptsto
take
advantage of
the
concurrent
processing capability
of amulticomputerto
optimize someperformance and/or
utilizationcriterion
by
rearranging
the
computing load
among
the
system's
processors.
A
multicomputeris defined
to
be
a
loosely
coupled
setof
processors,
eachcapable ofrunning
independently
ofthe
rest.
The
nextthree
subsections
define
three
broad
categories ofload
balancing
problems.1.0.1.
Process
to
processor
matching
to
maximize
system
performanceThe
defining
characteristic
of problemsin
this
category is
the
lack
oftask-related
information
whichmay
be
assumed
whendevising
a solution.
Here,
the
goal ofthe
load
balancing
algorithmis
to optimize
system performanceby
transferring
workfrom
one processorto
another.
The
designers
ofthese
algorithms assumethat
little
or noinformation
about a giventask
is
available
beforehand
(no
a prioriknowledge).
Any
predictions aboutthe
future
characteristics of atask
(e.g.
how
muchlonger it
willrun)
must
be
estimated
from
pasttask
performance(e.g.
how
long
it has
already
run)
or current characteristics
(e.g.
memory
size).Most
algorithms
in
this
category do
not concernthemselves
with userlevel
inter-process
communication(IPC)
issues.
That
is,
the
problem ofdetermining
the effect of
IPC
onthe
"optimal"assignment of
jobs
to processors
is generally
notaddressed.
This
is
usually because
it is
assumedthat
information
on userIPC
is
not available.The
systemswhichdo deal
withthis
problem eithertry
to
predictfuture IPC from
the process's
IPC
history,
orthey fall into
the
nextcategory.
1.0.2.
Distributed
programmodule
to
processormatching
to
maximize
program performanceIn
this
class ofload
balancing
problems,
muchmore
task-specific
information is
assumed
known.
Here
wehave
a
distributed
programcomposed
ofmodules
which runconcurrently
and
coordinate with each otherto
performsome activity.
It is
assumed
that
the
level
of
communicationamong
the
modulesandtheir precedence relationships
areknown
ahead oftime.
This information
mightcomefrom
a sophisticateddistributed programming language
system.The
goal ofthese
algorithmsis
to
find
a
mapping
ofthe
modules onto
the
processors
of the
distributed
system(as
is
shownin figure
1-1)
in
orderto
optimize performance
ofthe
distributed
program.This is in
contrastto
the
goal ofthe previous class: to optimize
performanceofthe
system.Algorithms in
this
category
are called
task
assignment,
task
Introduction
1
January
1991
Figure 1-1.
Optimizing
a
Distributed
Program
Composed
of4
Modules
1.0.3.
Maximizing
performance
subject
to
system-specific constraintsThe last category
of
load
balancing
algorithms are
those
whosegoals
and
assumptionsare
intimately
related
to the type
of
distributed
system on
whichthey
willrun.For
example,
if
youare
trying
to
load balance
a
distributed database
system,
your goalmight
be
to
minimizethe
averagequery
time of the system.
Typically,
these
systems are specialized and requiresolutions
specific to the
type
of system.That
is,
somecharacteristicsofthe
systemare usedin
formulating
the
load
balancing
policies.The
remaining
workin
this
proposalapplies
to the
first
category
ofload
balancing
algorithmsin
whichlittle
orno
a priori processinformation is
known.
Introduction
1
January
1991
1.1.
Problem Statement.
In
this
work,
as
in [Eager
86],
we wish
to
determine
an appropriatelevel
of
complexity for
a
load
balancing
policy
on a
homogeneous
network of computers.
In
addition,
certain situations
under which
load
balancing
may be
avoided with
low
probability
of significant system
degradation
are
investigated
to
determine
the
feasibility
of
startup
criteria
for load
balancing
algorithms.
To
achieve
the
former
goal,
load
balancing
algorithms of
varying
degrees
of
complexity
are
implemented,
their
performance
compared,
and a
level
ofcomplexity
beyond
whichthe
gainin
performance
is
small
is determined. Results
are compared
to
worst-caseand
best-case
analytical models so that some absolute measure of
performancemay
be
obtained.This
parallels
to
some
degree
the
modelling
workdone
by
Eager,
et.aL
in "Adaptive Load
Sharing
in
Homogeneous Distributed
Systems"[Eager
86]
and provides some real-life
verificationoftheir results.
To
provide some
insight into startup
criteria
for load
balancing
algorithms,
a
brief investigation
is
performed
making
use
ofan analysis
done
to
determine
the
likelihood
ofsituations
whereload
balancing
wouldbe
ofthe
mostbenefit. The
results
ofthe
analysis areformulae
whicharetested
by
comparison
withthe
observed
behavior
ofthe
system.
That
is,
tests are
run withand
withoutload
balancing
undervarying
load
situations,
andif
there
are substantial performancedifferences,
these should
be
predicted
by
the analysis.
This
analysis
is
an extension of workby Livny
andMelman
in "Load
Balancing
in Homogeneous Broadcast Distributed
Systems"
[Livny
82]
and
an application ofstandard
queueing
theory
results.The implementation
of
these
policies
is done
withoutmodifying in
any
way
the
underlying
UNIX
operating
system.Therefore,
only
shell-level
exportof processes
willbe
supported.In
the
proposal,
it
wasplanned
that
an"idealized"algorithm
wouldbe implemented in
whichknowledge
ofexisting
and/orfuture
task
load
situations might
be
usedto
obtain
nearly
optimal
performance.
This
plan wasabandoned,
however,
due
to
concerns
overimplementation
complexity
andtime requirements.
Also,
the performance the
optimalM/M/c
model(a
theoretical
upperbound
onperformance)
withc=5
nodes was much closerto the system's
actualperformancethan
was shownin [Eager
86]
with c=20 nodes.This
reducedthe
needfor
an"idealized"
Introduction
1
January
1991
1.2.
Previous
Work
Figure
1-2
is
a conceptual model of
how
a
load
balancing
algorithmmight
work.This diagram
is
introduced
not
to
attempt
to
completely describe
all
possiblepolicies,
but
ratherto
establish
aconceptual
framework for further discussion. In
the
diagram,
tasks
movealong
the
thin
lines
whilesystem state
information flow along
the
heavy
lines.
Also,
the textured
lines
referto
movement
between
processors while solid
lines indicate
intra-processor
movement.
As
canbe
seen
from
the
arcs
labelled
Tl,
T2
and
T3,
tasks
arrive ata
processorin
one
ofthree
different
ways.
They
are
initiated
at
the
local
processor,
they
are sent
to the
local
processorfrom
another processor
before
beginning
execution,
ortheir
executionon another nodeis
interrupted
and
they
are migrated
to the
local
processor.
At
this point,
it
makes sense
to
distinguish between
tasks
before
they
have
begun executing
and
tasks once
executionhas
started.
So,
for
the
remainder of
this paper, the
xtrmjob
willreferto
a task
before
executionhas
begun,
the term process
willapply
to
a
task after
it has
startedrunning,
and
the term task
will
be
usedif
eithermay be
true.
In
the two
cases
in
whichthe task
has
not yetbegun
executing,
we
identify
a.job
movefunction
whose purpose
it is
to
decide
whethera
givenjob
should
be kept
local
or moved onto
anotherprocessor.
In
the
case ofa task
whichhas
already
started
executing,
the process
movefunction decides
whetherto
keep
it,
or moveit
along.This
function
also serves
to
determine
which,
if any,
of
the
tasks
running
locally
should
be
shipped
out
to
another node.In
additionto these two
functions,
an
information
function
might existif
either movefunction
needsto
know
the state
of other nodesin
the
systemin
orderto
makeits
decisions.
-4-Introduction
1
January
1991
Figure 1-2.
A
Conceptual
Model
of
Load
Balancing
Load
balancing
policieshave been
classifiedmany
waysin
the
literature.
A policy
mightbe
calledprocess
scheduling,
preemptive ordynamic load
balancing,
processmigration,
ortask
migrationif
the
"process
move"function
exists withinit.
A policy
withoutthat
function
but
with
the
job
movefunction
is
sometimescalled
job scheduling
ornon-preemptive
orstatic
load
balancing. A policy
is
called adaptable
or systemstate
dependent if
the
"info"function
exists
andis
usedby
job
orprocess
move,
and
non-adaptable orstate
independent
otherwise.
If
the
job
moveor
processmove
function
selects
the
target
node
according
to some
probability
distribution,
the
algorithmis
said
to
be
probabilistic.A
deterministic
technique
uses some pre
defined
nonrandom algorithmto
makethe choice.
A
policy
is
called sender-originated
if
the
node
exporting
the task
initiates
the
load
shifting
procedure
andreceiver-originated otherwise.
An
algorithmmay be
optimal or suboptimal(heuristic),
depending
on whetherit is
a provenmathematical solution
to the problem,
orjust
a
technique
for
improving
the
system's
performance.(There
are no
workable optimalsolutions
to this
class
ofload
balancing
problems.)
As
with otherdistributed
operating
systemfunctions,
the
load
balancing
policy
can
be
distributed
among
the
processorsin
a
numberof ways.A
policy is democratic
if
allthe
nodesIntroduction
1
January
1991
balancing
is
called central
if
there
is
one node
executing
the
policy
onbehalf
ofthe
entiresystem.
Also,
the
balancing
algorithm can
be
divided among
the
nodesfunctionally,
orit
canbe
replicated,
witheach node
performing
all
functions
of
the
policy.
The load
balancing
policies
implemented
in
the
thesis are
static,
heuristic,
adaptable,
deterministic,
sender-originated,
democratic,
and
replicated algorithms.1.2.1.
Providing
system
information
The
function
of
gathering
and
distributing
system
stateinformation
canbe
split
roughly
into
three parts,
as
is illustrated in figure 1-3. The input
part servesto collect
information from
the
othernodes
in
the
system
and,
if
necessary,
translate
it into
a
form
usableby
the
job
move and/orprocess move
functions. The local
part monitorsthe
local
load
and
providesthat
information
to the
move algorithms.
The
output
function
makeslocal
load
information
available
to
othernodes
in
the
system.
The
analogousfunctions
in
the
thesis
systemdesign
areGet Remote Load
(2.3),
Maintain Local Info
(2.1),
and
Send
Local
Load
(2.2),
respectively.Figure
1-3.
The Info
Function
1.2.1.1.
Goals.
The
objectiveof
the
info
function
is
to
allowthe move
functions
to
useload information
from
other
nodes
in
the
systemin
making
their
decisions. A
second,
and
equally important
goal
is
to
minimizecommunicationoverheadwhileachieving
the
first
goal.This
issue is sufficiently
important
to
be
considereda
majordesign
goalbecause
of whathas
been
called
the
saturation effect[Chu
80]. Figure
1-4 illustrates
the
effect wherein asthe
number ofprocessors
increases,
communicationcostsdominate
the
gaindue
to
concurrency
andactual
throughput
decreases. An ideal
system without suchcosts
would experiencea
linear
gain
in
throughput.
This implies
that
any
load
balancing
algorithm willhave
alimit
on
the
number ofprocessors
which
it
can servicebefore adding
another one willdecrease
performance
[Livny
82]. (This
effectis
due
to
task
movementas
wellaspassing
stateinformation.)
Introduction
1
January
1991
Number
of
Processors
Figure 1-4.
The
Saturation Effect
1.2.1.2.
Issues.
In
addition
to the
communication problemdescribed
above,
there are
severalissues
regarding
the
collectionand
distribution
ofload
balancing
information.
A starting
pointin any
statetransfer
mechanismis
the
determination
ofthe
local load.
A
parameterorset
of parameters mustbe
calculated
whichindicate
how
much workthe
processorhas
to
do,
but
whichsmoothout
any
rapid
fluctuations due
to,
for
example, the
blocking
and
unblocking
of
processesin
a
multiprogramming
environmentSuch fluctuations
may
not reflectthe
true state
ofthe
processor
and
may
causeinstabilities (bad job
movements)
in
the
distributed
system[Barak
85b].
In
the
thesis
project, the
load
willchange
only
whena
job
starts orcompletes.
This
shouldavoidany
spuriousload
fluctuations.
Also,
there
is
a
fundamental
relationship between
the
duration
of a
load
fluctuation
which canbe
respondedto
effectively
and
the time
it
takes
to move a task.
If
youtry
to
respondto
a change whichwillhave
disappeared
by
the time the
job
yousent getsthere,
you willhave
createdan
instability.
Hence
any
fluctuations in
the
local load
whichcome and go
in
less
than
the time
it
takes
to
move atask should
be filtered
out.This
problemis
not
dealt
within
the
thesis
system.Another stability
problem whichmustbe dealt
with whenbalancing
the
processorload
arises whenone
processorbecomes suddenly
lightly
loaded
and
every
other nodethat
finds
out aboutthe
situationsendsjobs
to that
processor.This
resultsin
a
swampednode
whichitself
then
exports
tasks.
One
way
ofdealing
withthis
problemis
to
construct
aninformation
exchange
policy
whichlimits
the
number of processorsthat
hear
aboutthe
lightly
loaded
node
[Barak
85b,
Bryant
81 ].
There
are also
ways ofdealing
withthis
issue
in
the
job/process
move
functions,
which willbe
discussed in
a
later
section.This
problemas
it
pertains
to
the thesis
project will
be
discussed
whenthe
algorithmsthemselves
are presented.Other
less
significantissues
whichmay
be
addressed
in
the
info function design include:
(1)
dealing
with out-of-dateinformation
(i.e. making it
useful orthrowing it
away),
Introduction
1
January
1991
(3)
dealing
withthe potential
increase in
the
amountof
information
transfer
in
aheterogeneous
system over
ahomogeneous
one.
1.2.1.3.
Approaches
Information distribution
There
are
many
mechanisms
for exchanging load information
to
be found
in
the
literature.
Broadcast is
one of
the
simplest
in
which each node sends
eitherits
local load
orits
view ofthe
system
to
allother nodes
in
the
system.
This
method,
whilevery
simple,
is
very
expensive.For
each
broadcast,
all nodes must
interrupt
their
normal
processing
to
absorbthe
newinformation.
Also,
the
number
ofmessages
in
a
point-to-point system would growvery large
as
the number
ofnodes
increases. It
does, however,
allow all nodesto
adaptquickly
to
changes
in
the
system
load.
Unfortunately,
the
second
stability
problem mentionedin
the
previous section
is
made worse
by
this
very
characteristic.
Notify
-when-idlealgorithms and
bidding
algorithms are
specialcases
whichare
similar yetopposites.
In both
cases,
system state
information is
exchanged
whena task exchange
is
desired.
In notify
-when-idle(also
called
drafting
)algorithms,
lightly
loaded
processors request more workfrom
other nodesin
the
system[Livny
82].
In
bidding,
heavily
loaded
processors request
takers
for
their excess
work[Stank
84b].
The decision
on whetheror notto
send out a
request
canbe based
oneither
local
or
systeminformation. In
the
latter
case,
anotherdistribution
mechanism mustbe in
place
to
provide
the
requiredinformation.
These
methods
are
fairly
sophisticated,
although
in
notify-when-idlethe
processor whichdoes
most ofthe
workis
tightly
loaded. This is in
contrast
withbidding
wherethe
heavily
loaded
processor
must
bear
most
ofthe
burden.
On
the
otherhand,
bidding
should respond morequickly
to
load
surges on a particular nodesince,
in
notify-when-idle,
a
heavily
loaded
processor
must
wait untilit
receives a
requestfor
more
workbefore it
can export
any
ofits
processes.Overhead issues
in
this
method
rely
greatly
onthe
communicationmethod
usedto
transmit
requests.In
pairing,
nodesform
temporary
pairsfor
the purpose
ofexchanging load
information,
andpossibly
moving
jobs
[Bryant
81].
This
kind
oftechnique
has
relatively low
per-node and communicationoverhead,
andis
very
stable.But
it
allowsthe
systemonly
arelatively
slow andlimited
adaptability
to
systemload
changes.
In
randomscattering,
a nodewillselecta node
or small set of nodesrandomly
and sendlocal
and/orsystem
information
to
it
[Barak
85b].
Overhead,
stability,
and response should
be
about
the
same
as pairing.However,
randomscattering
could
allowany
particular
nodeto
go withoutnewstateinformation for
a
long
time,
and
this
effectincreases
as
the
number of nodesincreases.
Probably
the
simplestscheme,
andthe
scheme usedby
most ofthe
algorithms
implemented
in
the
project,
is
polling
or probing.Here
the
node
which wantsto export a task selects
potentialtarget
nodesand
requestsload information
from
them.
It
then
usesthis
information
to
makeits
target
decision
(see
[Eager
86]).
There
are
alsofeedback
schemesin
whichaftera
job is
moved, the target
node
tells
the
source node whether or notthe
movementwasbeneficial
[Mirch
86]. This is
usedin
additionto
one
of
the
othertechniques and
causes minimal additionaloverhead.
The
source node uses the
feedback
to
improve future
movementdecisions.
Introduction
1
January
1991
Information
structure and
content
Defining
the
local load is
a
necessary first step in
determining
the
systemload
and can
be done
in many
ways.
There
are several examples
[Barak
85b,Iivny
82,Stank
84a]
of
using
the
length
of
the ready-to-run queue
to
estimate
local load. A
more
sophisticated approachis
to
try
to
estimate a virtual response
time
for
a
typical
job [Bryant
81,
Chow
79,Hwang
82Juang
86,Wah 85]. Another
method
is
to
examine each process
running
on
the
local
processorindividually
to
determine if its
performance
here is
good or
bad
[Stank
84b]. This
method requiresa good
deal
of process-specific
information.
System
state
information
can
be
stored
in
the
nodes
ofa
distributed
systemin
severaldifferent
ways.
First,
every
node can
maintainits
own viewof
the
state of
the
system(or
part
ofit)
[Barak
84b,Mirch 86,Ouste 80].
Another
method
is
to
maintaina
hierarchy
ofinformation,
asis
illustrated in figure 1-5 [Wirti 80].
Alternatively,
there
couldbe
a single
node which containsall
the
load
status
information.
A
variation onthis
idea
using
monitornodes
is
presentedin
[Stank
85].
As
statedearlier, the
thesis
projectkeeps
track
of alljob
entries
and completionsto
maintain a current count ofjobs
at
each node.No
attemptis
madeto
makethis count
predictive.Each
noderequests remote
load
information
whenthe
load
balancing
algorithmsdemand;
none ofthis
data is
retained
for
future
use."node
3
has 2
underloadednodes"
"nodes
4
and
6
are
underloaded"
no
information
Figure
1-5.
Example
of
hierarchical info
organization
1.2.1.4.
Results
In
additionto those
mentionedin
the
previoussection, there
are a couple of
interesting
resultsin
the
literature.
In
calculating local
loads,
Barak
and
Shiloh
showa simple extension
from
number
of
jobs
ready
to
runto
an estimationof
responsetime
using
processor
speeds[Barak
85b].
Also, Mirchandaney
andStankovic
illustrate
in [Mirch
86]
acharacteristic
oftheir
algorithm
in
whichperformanceis
insensitive
to the
information
exchange
frequency
up
to
a point:the
point atwhichthe transmitted state
information has little
correlation
withthe
actual state ofthe
system.They
pointout that
this
characteristic
limits
their
algorithm
to
smaller(or
partitioned)
networksbefore
the cost
ofdistributing
state
information becomes
prohibitive.In
their
simulation,
each nodeperiodically broadcasts
its local load
to the
other
four
nodesin
the
Introduction
1
January
1991
1.2.2.
Export
and
target
decisions
The
process move and
job
move
functions
can
both be
broken down
further. Since both
functions have
the
same
basic
goals and
many
of
the
issues
involved
are
the
same
for
both,
they
willbe
treated
identically
from
this
point
forward
unlessspecifically
mentioned.As figure
1-6
suggests,
both
functions
need
the
following
four
capabilities:(1)
export:
decides
whether a
task
should
remainin
the
local
processor,
(2)
target:
chooses a node
for
remote
task execution,
(3)
local:
prepares
tasks to
run
locally,
and
(4)
remote:
sends
tasks
out
for
remote
execution.Figure 1-6.
The Job Move
and
Process Move Functions
In
this
project,
the
export andtarget
functions
are combinedinto
asingle
function
calledthe
LB
Algorithm
(4.2). The
send andlocal
functions
correspondto the
Send
Job
(6)
andRun Job
(5)
functions,
respectively.1.2.2.1.
Export
Goals
The
objective ofthe
exportdecision is
to
select atask to remove
from
the
local
node sothat
overall
system performance willbe improved.
(Selecting
no
jobs is
a valid
decision.)
As
withany
part ofthe
load
balancing
problem,
keeping
communication coststo
a minimum goeshand
in hand
withthis
primary
goal.The
overallobjective then
becomes
to export the
minimumnumberof
tasks
necessary
to achieve effective
load balancing.
1.2.2.2.
Target
Goals
While
optimizing
system performanceis
the
ultimategoal,
selecting
a
destination
node
for
ajob
in
orderto
minimizethe
rantime
ofthe
job is
frequently
seenas
the same problem.
The
tempering
secondary
goals ofavoiding
instabilities
andminimizing
network
congestion makethe choice
ofa target
algorithma
moresubtle one.
Introduction
1
January
1991
1.2.2.3.
Export
and
target
decision issues
There
are
many issues
to
deal
withwhen
building
the
export and target algorithms.
Merely
deciding
whenthere are
too
many
tasks
in
the
local
processor canbe
complicated.
This
decision
can
depend
on
the
load
of
the
overall
system,
or
just
onthe
load
of
the
individual
processor.
It
can
be
made a part of
the target
decision
or a part of
the
export
decision.
For
example,
if
a
policy
is
triggered
by
jobs entering
the
system,
andif
the
policy
estimates
response
times
for
the
local
node versus
the
remote
nodes,
then
the
policy has incorporated
the
determination
ofa
local load
threshold
into
the target
algorithm.
There
are also several
stability issues
to
accommodate.
If
a task
spends allits time
wandering
aroundfrom
node
to
node and never stops
anywhere,
it
nevergetsany
constructiveworkdone. When
this
problembecomes
epidemic
among
tasks,
processorsalso
spend alltheir time
on
load
balancing
overhead.
This is
called processor
thrashing.
On
the
otherhand,
the
inevitable uncertainty
ofthe
state
information
gatheredmay
causetasks
to
be
incorrectly
transferred.
This
problem conflicts
withprocessor
thrashing
because in
this
case,
we wouldlike
to
be
able
to
re-route
the task
to
a more suitable processor.
An
appropriate
balance
between
non-excessive
task movement
andrecoverability
from
improper
movements
mustbe
struck.
The
project algorithms
alllimit
the number
oftimes
ajob
canbe
migratedto once.
One
ofthe
stability
problems
described
in
section
1.2.1.2
can alsobe
dealt
withhere.
Altering
the target
decision
could reducethe chances
ofhaving
many
nodesdump
jobs
on asuddenly
lightly
loaded
node.For
example,
implementing
a maximumhop
count
to eligible
destinations
would
limit
the
scope of nodeseligible
to
receive ajob.
This
wouldeffectively limit
the
number ofprocessors
whichcould
send ajob
to
any
given node.There
are
severalfactors
the
target
algorithmmay
wantto
considerin making its
selection ofa
destination
node.If
the
systemkeeps
track
ofprocess
communicationsto
other processorsin
the
systemand
if
the
load
balancing
policy is
preemptive,
the target algorithm
couldchoose
a node with whichthe
processhas
exchangeda
large
number ofmessages
[Barak
85b].
Next,
the
suitability
ofthe
processorto
handle
the
task
(e.g. memory
requirements)
might
have
to
be
considered.
This
project'salgorithms neither collect nor make use of suchinformation.
1.2.2.4.
Approaches
There
are
many
waysto
approachthe
problems ofdeciding
whichtasks
to
moveto
which processors.In
probabilisticschemes, the
target
is
selectedaccording
to a
probability
distribution
whichmay
be
alteredto
reflect
the
systemstate
[Chow
79,Livny
82,Mirch
86,Stank
85]. The
pairing,
bidding,
and
drafting
algorithms
discussed in
section1.2.1.3
are adaptiveand
deterministic. Preemptive [Barak
85b3ryant
81,
Stank
84b]
and non-preemptive
[Chow
79,Hwang
82Jones
79Juang
86,Livny
82,Mirch 86,Ouste 80,Stank 85,Stank
84a,Witti
80]
techniques
canbe
used.Preemptive
algorithms sometimes require a process to
run a
minimumtime
before
it
can
be
migrated,
eitherto
improve stability
[Barak
85b]
orto
gather statistics[Barak 85b3ryant
81].
This
project considers
any job entering
the system
as eligiblefor
export,
and usesa
simpleload
threshold
test to
make
the
decision.
A
target
decision
caninclude
many
factors,
the most
popularbeing
anestimated response time
for
the
task on
the
eligibletarget nodes.
Inter-process
communication,
memory
requirements,
cost of
migrating
the
job,
hardware
requirements,
software
requirements,
etc.
canalso
be
considered.
A
target
decision
canbe
eithera
final
destination,
orjust
the
"best"nearby
node.The
nearby
nodemay
then
moveit along
to
an evenbetter
node
[Bryant 81,Ghafo 86].
Several
different
target
decision
algorithmsare
implemented in
the
project and are
described in
sectionIntroduction
1
January
1991
1.2.2.5.
Results
Some early
results
in
the
literature
tend to
justify
the
concept ofload
balancing
and
point outsome
fundamental limitations. Figure 1-7
[Livny 82]
points
outthe
need
for load
balancing by
illustrating
the
high probability
of
having
one node
doing
nothing
while anothernode
has
too
much
to
do in
an unconnected system of
computers.*
In
that chart,
one
canalso see
that
at
very
high
and
very low
node
utilizations,
load
balancing
may
notbe
appropriate,
especially for
smallerN
(number
ofnodes).
This
result
is
generalized
to
include
other situationsin
whichsystem performance should
be
enhanced
by
load
balancing
in
section1.3.2.
It
has
also
been
shownthat
even whenthe
average arrival rates at each of the
nodesare
identical,
load
balancing
can provide significant
improvement in
system performance
[Bryant
81].
Probability
of idlenodewhilewaiting
task,
N
nodesw
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Per
nodeload,
rho
?-N
=5
O-N=io
-N=20
Figure 1-7.
Wasted Resources in
an
Unconnected System.
However,
the
improvements load
balancing
achievesdo
not comefor
free. To
achieveoptimalbalance,
many
job
migrations aregoing
to
be
necessary
[Livny
82],
and
the
saturationeffect
described
earlierand shownin figure
1-4
is going
to take
its
toll
on system performance.This
suggests
that
"every
[load
balancing]
algorithmreaches
a pointat
whichanincrease in
the
number ofservers
decreases
the
performance ofthe
system"
[Livny
82].
It
also
suggeststhat
algorithms shouldbe
designed
witha
strong
emphasis
onminimizing
communication costs.It
has
been
suggestedthat
"the
expectedtransmission
delays
ofthe
balanced
system"be
usedin
determining
a
load
balancing
policy
[Livny
82].
With
the
importance
of
communicationoverheadin
mind,
it is
sensible
that
algorithms
have
been developed
specificto a
particulartype
of
network[Juang
86,Livny
82,Wah 85]. One
interesting
resulthere is
analgorithmona multiaccess
network(e.g.
Ethernet)
whose overheadis
independent
ofthe
numberof nodesin
the
network[Wah
85]. This
resultis
not consideredin
this
projectbecause
to
implement
it
wouldrequire specialized
hardware.
This
chartis
notbased
on theformula
givenin
[Livny
82],
but
moreclosely
resemblesthecharttherein.The
[Livny
82]
formula has
atypographical error.The
correctedformula is derived in
Appendix A.
-12-Introduction
1
January
1991
Several
results specific
to
a particular
type
of algorithm are available.
Non-preemptive
techniques
whichare
triggered
by
newjob
arrivals can
be
modelledby
a
multi-serverqueueing
system
withjob routing
[Juang
86,Chow
79]
as shown
in figure 1-8. State dependent
oradaptive policies
showbetter
performance than
state
independent
ones
[Chow
79]
and
canbe
almost as good as a single
fast
computer
[Juang
86].
Finally,
in
a
bidding
system,
Stankovic
and
Sidhu
demonstrated
that
adapting
the
distance
a
bid
travels to the
system
load improves
the
performance of
the
system
[Stank
84b].
Figure 1-8.
Job Move Triggered
by
Job
Arrivals.
1.2.3.
Task
movement
Task
movement
is
the
actualmechanics
ofmigrating
tasks
from
one nodeto
another.This
function is
fairly
straightforward
in
the case of non-preemptive
load
balancing,
so
the
remainder of
this
section willdeal primarily
withpreemptive policies.
1.2.3.1.
Goals
The
maingoal oftask
movementis
to
causea
task
whichis
executable
onone
node ofthe
system
to
become
executable
on another nodein
a
transparent manner.
Neither
the
userorthe
task should
have
to
do
anything
to
accommodatethe
change.As
with allload
balancing
functions,
per-nodeand
communicationcosts
mustbe
minimized.*1.2.3.2.
Issues
There
are
two
mainissues involved in
the
design
ofa
task
migration
function. The first is
the
isolation
ofthe
process andits
environmentfrom
the
hardware it is running
on.Things like
openfiles
and mounteddevices
mustbe
handled
in
sucha
way
as
to
allowthem
to
be
accessedin
the
sameway
from
any
node.The
implication
here is
that the structure
ofthe
operating
systemshould
take
into
accountthese
process migrationrequirements.
[Barak
85a]
provides
a gooddescription
of an approachto this
problem.Also
important in
the
movement oftasks
is
the
network's
capability
to
handle
those
transfers
very
efficiently.Since
wealready know
that
many
transfers are
necessary
to
maintain abalanced
load,
high
efficiency
in
handling
those
transfers
is going
to
be
vitalto
the
performance ofthe
system.Introduction
1
January
1991
1.2.4.
"Adaptive
Load
Sharing
in Homogeneous
Distributed
Systems"This
paper
[Eager
86]
by Eager,
Lazowska,
and
Zahorjan deserves
special
mentionhere
because it
provides
allthe test algorithms that
are
implemented
in
this
project
It is
also
the
original motivation
behind
the
entire project.
In
their
work,
Eager
et. al. claim and
demonstrate
that the
use of
very
simple policies
whichuse
very Utile
system state
information
"yield dramatic
performanceimprovements
relative to
the
no
load sharing
case"
and
"yield
performance close
to
that
whichcan
be
expected
from
complex policies".
Their
system model consists of
ahomogeneous
collectionof processorsconnected
viaa
broadcast
network
(such
as
Ethernet). All
nodes
have
Poisson
arrivals
with a common meaninterarrival time
and serve
those
jobs in exponentially distributed
time,
again with an averagecommon
to
all nodes.
The
cost of
job
transfers
is
modelledas a
processorcost
ratherthan
acommunication cost
(these
are neglected).
This last
assumptionis
supported
by
referenceto
experimentation and
by
analysis.
The
cost of
probing
a node
for load information is
assumedto
be
negligible.
Job
transfer
activity
ateach node
is
given
preemptivepriority
overjob
processing.
The
modeldoesn't
care
which particularservice
discipline
is
usedfor
processorscheduling,
as
long
asit
selectstasks
without concernfor
their
actual servicetime.
In
the
decomposition
oftheir
model(done
to
simplify
the
analysis),
they
assume"that
the
stateof each
node
is stochastically independent
ofthe state of
any
othernode".
They
claimthis
approach"is asymptotically
exact asthe
number ofnodes
increases".
They
further
claimvalidationof
their
major resultsfor
networks
ofas
few
as
20
nodes.Tests
in
the
thesis
arerunon
fewer
than
20
nodes andthus
an attemptis
madeto expand the
applicability
oftheir
results.This
project
does
notattempt
to
distinguish between
the
processorand
communication costs oftransferring
a
job
since
anexisting
communicationpackage
is
usedto
implement
transfers.
However,
the
assumptionthat
one predominates overthe
other mighttend to
break down
whenthe
systembecomes
very
busy
andcollisions
onthe
Ethernet become
morefrequent.
Therefore,
if
significantdeviation
from
their analysis occurs
mainly
during
high load
systemstates,
it
may
be
anindication
that
the
communicationcosts
arebecoming
non-negligible.Their
assumptionof negligibleprobing
(polling)
costsis
tested
by
direct
measurement.-14-Theoretical Development
1
January
1991
2.
Theoretical
Development
There
are
two
distinct
areas of effort
described in
this
paper.
The first is
the
design,
implementation
and
testing
of several
load
balancing
policies
ona
network ofUNIX-PC's.
The
second
is
a
queuing
theory
analysis of possible
startup
criteriafor load
balancing
algorithms
in
general.
Sections
2.1
and
2.2 respectively
coverthese topics.
Section
2.3
speaks
briefly
on
ensuring
fairness
in comparing
the
algorithms.Section 2.4
describes
the
analytical models used
in
this
paper.
2.1.
The Load
Balancing
Algorithms
As
was stated
earlier,
we are
attempting
to
determine
alevel
of algorithmcomplexity
beyond
whichthe
gainin
performance
is
small,
and so
we willtest
algorithms ofincreasing
complexity
and measure
any corresponding increase in
performance.
In
orderto
demonstrate
that the
algorithms are
in any way
beneficial,
control
tests
willbe
runin
whichno
load
balancing
willbe done. We
willalso compare results to
a"worst
case"
analytical model
(c
M/M/l
queues)
as wellasto
an analyticalbest
case
(M/M/c
queue).
All
algorithms are
taken
from [Eager
86]
and are
described
in
the
sectionsbelow.
Any
differences
between
the
algorithms
usedhere
andthose
in [Eager
86]
are
described herein.
2.1.1.
Random
This is
the
simplest
algorithm which uses nosystem state
information
at
all.It simply
decides
to
migrate anentering job if
the
local
load
exceeds
a giventhreshold.
We
willhenceforth
referto
this technique
ofmaking
the export
decision
as
the
Threshold Test. The
target node
is
selected
at random.As
shownin
[Eager
86],
such an algorithmis
unstableif
nodes are permittedto
exportjobs
receivedfrom
another node.This
instability
canbe
removedby
placing
alimit
Lt
onthe
number oftimes any
given
job
may be
migrated.After
ajob
has been
movedLt
times,
it
mustbe
processed whereverit is.
Lt
is
setto
onein
this
implementation.
2.1.2.
Threshold
This
algorithmusesthe
sametechnique to make
the
exportdecision
as
in
Random,
but
potentialtargets
are selectedat
random and polled untilone
is
found
whichcanaccept
the
job
withoutits
local load
going
overthe threshold.
A
staticlimit
Lp
is
placed onthe
number of potentialtargets
whichmay be
polled.If
that
limit is
reached
withoutfinding
anacceptable
target,
the
job in
questionmustbe
processedlocally. The
job,
oncetransferred,
mustbe
processed
atthe
target
node.No
forwarding
ofjobs is
allowed
in
this
algorithm.2.1.3.
Shortest
In
this
algorithmwe again usethe
Threshold
Test
for
the export
decision
as
in
Random,
but
agroup
ofT
potentialtargets
is
selectedand polled
to
determine
their
local loads. The job is
transferred
to
the
nodehaving
the
lowest local
load,
if
doing
so
wouldnot
bring
the
destination's load
overthe threshold.
An
improvement
overthe
algorithm presented
in
[Eager
Theoretical
Development
1
January
1991
None
ofthe
above algorithms
deal in any way
with
indiscriminate
dumping
of
jobs
onto a
processor which
has recently become
lightly
loaded. Since
polling
is
usedas the
way
ofgetting load
information,
a node which
becomes
free may
notbe found
by
many
nodes.
Although
this
is
no guarantee
that the
situation will not
occur,
it
shouldlimit
its
frequency
ofoccurrence.
The job
migration
limit,
however,
does
deal
withthe
processorthrashing
problem.
2.1.4.
Analytical
Models
The M/M/l queueing
model
has
awell
known
solution(see
[Ross
72])
for
responseR
versussystem utilization p.
R=
l
1
-p
This is
a
lower
performance
bound
whichis
expected
to
approximatethe
performance ofthe
noload
balancing
control case
but
to
fall
short of
all other modelsand
experiments.The
upperbound M/M/c queueing
modelhas
the
solutionR=
(cp>C
\t
(?
+
j2
[k=0
k!
r+1
which
is derived from
the
formula
in
[Tijms
86]
for
the
averagedelay
in
queue(E[Wq])
of a customerin
such amodel.
From
the
averagetime
in
the
queue,
oneadds
the
average servicetime
(1/p)
to get the
averagetime
in
the
systemand then
multipliesby
u.to
getthe
response,
whichis just
the
average
time in
the
systemdivided
by
the
average service
time. Manipulations
suchas
these are
discussed
in
[Ross 72].
2.2.
Criteria for
Using
Load
Balancing
The
goalof
this
sectionis
to
define
criteriafor predicting
the
usefulness ofload
balancing
in
various situations andin
so
doing,
define
startup
criteria
for load
balancing
policies.This is
done
by
attempting
to
characterize situationsin
whichload
balancing
wouldbe
useful andcalculating
the
probability
that
such situations willexist.
This
analysisis
an extensionof workby Livny
and
Melman in "Load
Balancing
in Homogeneous Broadcast Distributed
Systems"
[Livny
82].
In
that
paper,
Livny
andMelman
providea
rough characterization of usefulload
balancing
situations.In
adistributed
systemit
mighthappen
that a task waitsfor
service atthequeue of oneresource whileat thesametimeanother resourcewhich
is
capableofserving
the taskis idle.
A load
balancing
algorithmwhose goalis
tominimizetheexpectedturnaround timeofthejobs
will tend topreventthesystemfrom
reaching
such astate.1They
showthe
generalusefulnessofload
balancing by
calculating
the
probability
that
"the
systemis in
a
statein
whichatleast
one
customer waitsfor
service and
atleast
one
serveris
[Livny
82]
p.48.
Theoretical Development
1
January
1991
idle."2
One
can generalizethis
idea
by
saying load
balancing
willbe
useful wheneverthere
exists a node
witht
or morejobs
while another node
has
m or
fewer
jobs (t>m+l).
In
otherwords,
load
balancing
willbe
useful whenever one node
is
very
busy
while another nodeis
notso
busy.
"Very
busy"and
"not
so
busy"are
implementation-dependent
parameters.Typically,
UNIX-PC's
of
the
type
in
the
GCSD lab
can
handle 2
concurrent
jobs before
the
waitbecomes
prohibitive.
They
may
be
considered
"not
so
busy"with1
or
0 jobs
running.The formula resulting from
this
analysis should reduce
to the
[Livny
82]
result
whent=2
and
m=0.
2.2.1.
The
FIFO
Server Model
The
intent is
to
calculate
the
probability
that
1
or more of
the
nodes
in
the
systemhas
t
or morejobs
aM 1
ormore nodes
has
m
orfewer jobs (call
this
probability
Ptwm)-
Assume
nidentical
unconnected
processors modelled as
M/M/l
FIFO
servers.Let
I;
be
the
probability
that some subset
ofi
processors
has
m or
fewer
jobs
andlet
B;
be
the
probability
that
i
processors
have
m+1 or morejobs
withone
or more ofthem
having
t
or morejobs.
Ptwm
is
calculated
by
considering
allthe possible
waysthat
these
situations could arise:1
node withm"
jobs,
n-1nodes
withm+l+and
1
ormore
witht+jobs
2
nodes
withm"
jobs,
n-2nodes
withm+l+and
1
or more
witht+jobs
3
nodes with irrjobs,
n-3
nodes withm+l+and
1
or more witht+jobs
i
nodes withm~
jobs,
n-i nodes with m+l+and
1
or morewitht+jobs
n-1 nodes with
m"
jobs,
1
node withm+l+and
1
or more witht+jobs
Here,
the
notationx+should
be
read"x
ormore"
and
x"
"x
orfewer". So
sincethere
are(
)
ways
the
i*situationcouldoccur,
the
desired probability is
n-1
Ptwm
=X(i)I>B"-
(Eq.2-1)
i=l
Before getting into
the
actualderivation,
here
is
somequeueing
theory
background for M/M/l
FIFO
servers.Let
p be
the
utilizationofthe
node(arrival
ratedivided
by
service rate).Then
the
probability
that
anM/M/l FIFO
serveris
idle
is
Po=(l-p)
andthe
probability
that a node
has
exactly k
customers
is
wellknown
to
be
Pk
= Pod-Po)k = (l"P)Pk(Eq.
2-2)
The probability
that
aserverhas k
or more customersis
Theoretical
Development
1
January
1991
P(k+)
=1
-Pk
=i
.(1-pvf^f
= p*(Eq.
2-3)
j=0
P
-L
The probability
that
a server
has between i
and
j
customers
is
(j>i)
P(i:j)
=P(i+)
-P(j+i+)
= pi.pj+i(Eq.
2-4)
Now,
in
terms
of
the probabilities calculated
above,
Ij
can
be
expressedIi
= P(m-) =(1
- P(m+l+))i =(1
- pnn-i)i
(Eq.
2-5)
To
calculate
Bi5
consider all
the
ways a set
ofi
processorscould
have
m+1
ormore
jobs
whileat
least
1
of
them
has
t
or more
jobs:
1
node
with
t+jobs,
i-1
nodes
withbetween
m+1and t-1
jobs
2
nodes
witht+jobs,
i-2
nodes withbetween
m+1and t-1
jobs
3
nodes
witht+jobs,
i-3
nodes
withbetween
m+1and t-1
jobs
j
nodes
witht+jobs,
i-j
nodes withbetween
m+1 andt-1
jobs
i-1
nodes
witht+jobs,
1
node
withbetween
m+1and t-1
jobs
i
nodes
witht+jobs,
0
nodes withbetween
m+1 andt-1
jobs
Then
i
Bi
=X(j)p(t+)jp(m+i:t-i)H
j=l
Using
the
binomial
expansiontheorem and
subtracting
the
j=0
term
yieldsBt
=[P^
+
POn+Lt-l)]* - P(m+l:t-l)i =[P(t+)
+
P(m+1+)
- P(t+)]'-[P(m+1+)
- P(t+)]i(Eq.
2-6a)
and
finally
using
Eq.
2-3
givesBi
=(pm+lY
- (pm+1 - p')1(Eq.
2-6)
So
nowsubstituting
L.
and
Bn_j
(Eq's
2-5,6)
into
the
formula
for
P^m
(Eq.
2-1)
results
in
n-1
Ptwm
=S(l)
<!
" Pm+1)1 [(P^1)"-1 " (Pm+1P')""1]
i=l
Again,
applying
the
binomial
expansiontheorem
andsubtracting
the
i=0
term,
this time
aftersplitting
the
suminto
two
sums yieldsPtwm
= (l-pm+l+pm+l)n (pm+l)n. (l_pm+l+pm+l_pt)n+
(pm+l.pt)n-18-Theoretical
Development
1
January
1991
I
Ptwm
=1
- (Pm+1)n' d-P1)"+
(P^'-PP"(Eq.
2-7)
|
Note
that this
does
indeed
reduce
to the
[Livny
82]
formula
as
givenin
the
Appendix
whent=2
and
m=0.2.2.2.
The Round-Robin Server Model
--Late
Arrivals
The
problem withthe
above analysis
is
that
it is based
on a
calculationofthe
oddsthat
amultiprocessing
computer
has
nconcurrent
jobs.
However,
we aremodelling
that
computer witha
FIFO M/M/l
queue and
calculating
the
odds
that
the
number of customersin
the
serveris
n.This
probability is
more
accurately
calculated
using
around-robinserver model.To
reviewthis
model,
consider aqueue
containing
4
jobs.
Every Q
seconds(1
quantum),
the
job
currently
executing is
interrupted,
andif it
stillneeds
moreprocessing,
it is
returnedto the
endof
the queue.
The
nextjob in
the
queue
(determined in
FIFO
fashion)
is
then
runfor
Q
seconds
before it is
interrupted,
and so on.
Arrivals
arehandled
by
adding any
which arrivedduring
the
last
quantum
to the
queue after
the
job
currently executing
has
been
requeued(if
necessary).Kleinrock
terms this
a"late
arrival"
system
in [Klein
64].
Note
that
this
modelassumes that
nomore than
1
job
may
arriveduring
any
quantum period.While
this
assumption
is
not
strictly
realistic,
it
approaches
realismas the
size ofthe
quantumdecreases.
Given
the
abovemodel,
the
probability
that there
are njobs
atthe
nodeis
givenin [Klein
64]
as
Pn
= (l-a)an(Eq.
2-8)
where a =
y-^
and
p
=1
-a
p
is
asbefore
the
utilizationofthe
nodeand
X
is
the
arrivalrate,a
is
the
probability
that
ajob,
having
reachedthe
end ofa time
slice,
willrequire moreprocessing
time
to
complete.Q