Thermal Aware Workload
Scheduling with Backfilling for Green
Data Centers
Lizhe Wang, Gregor von Laszewski,
Jai Dayal, Thomas R. Furlani
Outline
•
Background and related work
•
Models
•
Research problem definition
•
Scheduling algorithm
•
Performance study
Context
Cyberaide
A project that aims to make advanced cyberinfrastructure
easier to use
Future Grid A newly NSF funded
project to provide a testbed that integrates
the ability of dynamic provisioning of
resources.
(Geoffrey C. Fox is PI)
GreenIT & Cyberaide How do we use
advanced
cyberinfrastructure in an efficient way
GPGPU’s Application use of
FutureGrid
•
The goal of FutureGrid is to support the research that will
invent the future of distributed, grid, and cloud computing.
•
FutureGrid will build a robustly managed simulation
environment or testbed to support the development and
early use in science of new technologies at all levels of the
software stack: from networking to middleware to
scientific applications.
•
The environment will mimic TeraGrid and/or general
parallel and distributed systems
•
This test-bed will enable dramatic advances in science and
University of Virginia (UV) Technical University Dresden GWT-TUD GmbH, Germany
University of Tennessee – Knoxville (UTK)
FutureGrid Partners
•
Indiana University
•
Purdue University
•
San Diego Supercomputer Center at University of California San
Diego
•
University of Chicago/Argonne National Labs
•
University of Florida
•
University of Southern California Information Sciences Institute,
University of Tennessee Knoxville
•
University of Texas at Austin/Texas Advanced Computing Center
•
University of Virginia
Green computing
•
a study and practice of using computing
resources in an efficient manner such that its
impact on the environment is as less
hazardous
as
possible.
–
least amount of hazardous materials are used
–
computing resources are used efficiently in terms
Cyberaide Project
•
A middleware for Clusters, Grids and
Clouds
•
A collaboration between IU, RIT, KIT, …
•
Project led by
Objective
•
Towards next generation cyberinfrastructure
•
Middleware for data centers, grids and clouds
•
Environment respect
•
To reduce temperatures of computing
resources in a data center, thus reduce
cooling system cost and improve system
reliability
Model
•
Data center
–
Node: <x,y,z>, t
a, Temp(t)
–
TherMap: Temp(<x,y,z>,t)
•
Workload
t RC-thermal model Online task-temperature Nodei.Temp(t)
Temp(Nodei.<x,y,z>,t)
PR+
Nodei.Temp(0)
task-temperature profile nodei
<x,y,z>
ambient temperature:
TherMap=Temp(Nodei.<x,y,z>,t)
Nodei.Temp(t)
P C R
Nodei.Temp(t)
Temp(Nodei.<x,y,z>,t)
Research issue definition
•
Given a data center, workload, maximum
temperature permitted of the data center
•
Min T
response
Workload model Data center
model
TASA-B
Cooling system control Workload placement
online
task-temperature
input
schedule
input input
task-temperature profile RC-thermal model Workload model Thermal map Data center model TASA-B Cooling system control Workload placement calc ulat
ion task-temperatureonline
task-temperature profile RC-thermal model Workload model Thermal map Data center model TASA-B Cooling system control Workload placement Cont rol calc ulat
ion task-temperatureonline
task-temperature profile RC-thermal model Workload model Thermal map Data center model TASA-B Profiling tool Cooling system control Workload placement Cont rol profiling calc ulat
ion task-temperatureonline
task-temperature profile RC-thermal model Workload model Thermal map Data center model TASA-B
Profiling tool monitoringservice
Cooling system control Workload placement Cont rol profiling calc ulat
ion task-temperatureonline
CFD model provide information Calculate thermal map
Scheduling framework
Job
subm
ission
Jobs Job queue
Update data center
Information periodically
Job
sc
he
duling
Rack Data center
Task scheduling algorithm with
backfilling (TASA-B)
•
Sort all jobs with decreased order of
task-temperature profile
•
Sort all resource with increased order of
predicted temperature
•
Hot jobs are allocated to cool resources
•
Predict resource temperature based on
online-task temperature
Node Available
time t0
Time backfilling holes
nodek.tbfsta, backfilling start time of nodek
node
m
ax1
node
m
ax2
nodek.tbfend,
end time for backfilling
node
m
ax1
Temperature
Tempbfmax
Node Temperature backfilling holes
nodek.Tempbfsta, start temperature for backfilling of nodek
node
m
ax2
nodek.Tempbfend, end
temperature for backfilling
Simulation
•
Data center:
–
Computational Center for Research at UB
–
Dell x86 64 Linux cluster consisting 1056 nodes
–
13 Tflop/s
•
Workload:
–
20 Feb 2009 – 22 Mar. 2009
Simulation result
Metrics TASA
Reduced average temperature 16.1 F Reduced maximum temperature 6.1 F Increase job response time 13.9% Saved power 5000 kW
Reduced CO2 emission 1900kg /hour
Time (hour)
1 51 101 151 201 251 301 351 401 451 501 551 601 651 701
Average
temperatue
(F
)
70 80 90 100 110
Simulation result
Metrics TASA-B
Reduced average temperature 14.6 F Reduced maximum temperature 4.1 F Increase job response time 11% Saved power 4000 kW
Reduced CO2 emission 1600kg /hour
1 51 101 151 201 251 301 351 401 451 501 551 601 651 701