ContentslistsavailableatSciVerseScienceDirect
Sustainable
Computing:
Informatics
and
Systems
j ourna l h o me p ag e :w w w . e l s e v i e r . c o m / l o c a t e / s u s c o m
Maximizing
the
detection
probability
of
overheating
server
components
with
sensor
placement
based
on
thermal
dynamics
Xiaodong
Wang
a,∗,
Xiaorui
Wang
a,
Guoliang
,
Cheng-Xian
Lin
caTheOhioStateUniversity,USA bMichiganStateUniversity,USA cFloridaInternationalUniversity,USA
a
r
t
i
c
l
e
i
n
f
o
Articlehistory: Received4September2012 Accepted29January2013 Keywords: Datacenter CFD Sensorplacement Thermalmonitoring Overheatingdetectiona
b
s
t
r
a
c
t
Serveroverheatinghasbecomeawell-knownissueintoday’sdatacentersthathostalargenumber ofhigh-densityservers.Thecurrentpracticeofserveroverheatingdetectionistomonitortheserver inlettemperaturewiththetemperaturesensorontheserverenclosure,ortheCPUtemperaturewith on-diethermalsensors.However,thisisincontrasttothefactthatdifferentcomponentsinaserver mayhavedifferentoverheatingthresholds,whicharecloselyrelatedtotheirrespectivethermalfailure ratesandexpectedlifetimes.Moreover,thethermalcorrelationbetweentheinlet(orCPU)andother servercomponentscanbedifferentforeveryservermodel.Asaresult,relyingonthesingleinletor CPUtemperatureforserveroverheatingdetectionisover-simplistic,whichmayleadtoeitherdegraded detectionperformanceorfalsealarmsthatcanresultinexcessivecoolingpower,leadingtounnecessarily lowinlettemperature.
Inthispaper,weproposeamodel-basedapproachthatleveragesthermaldynamicstointelligently choosesensorplacementlocationsforpreciseoverheatingservercomponentdetection.Wefirst formu-latethedetectionproblemasaconstrainedoptimizationproblem.WethenadoptComputationalFluid Dynamics(CFD)toestablishthethermalmodelandanalyzethethermalstatusoftheserverenclosure undervariousoverheatingconditions,suchasinletoverheating,fanfailuresandCPUoverloading.Based ontheCFDanalysis,weapplydatafusionandadvancedoptimizationtechniquestofindanear-optimal solutionforsensorplacementlocations,suchthattheprobabilityofdetectingdifferentoverheating com-ponentsissignificantlyimproved.Ourempiricalresultsonarealrackservertestbeddemonstratethe detectionperformanceofoursolution.Extensivesimulationresultsalsoshowthattheproposedsolution outperformsothercommonlyusedoverheatingmonitoringsolutionsintermsofdetectionprobability anderrorrate.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
Inrecentyears,serveroverheatinghasbecomeoneofthemost importantconcernsinlarge-scaledatacenters.Duetothe consider-ationssuchasrealestateandintegratedmanagement,datacenters continue to increase their computing capabilities by deploying high-densityservers(e.g.,bladeservers).Asaresult,the increas-inglyhighserverandthuspowerdensitiescanleadtosomeserious problems.First,thereducedserverspacemayresultinagreater probabilityofthermalfailuresforvariouscomponentswithinthe servers,suchasprocessors,harddisks,andmemories.Such fail-uresmaycauseundesiredservershutdownsandservicedisruption.
∗Correspondingauthor.Tel.:+18653847365.
E-mailaddresses:[email protected],[email protected](X.Wang), [email protected](X.Wang),[email protected](G.Xing),lincx@fiu.edu (C.-X.Lin).
Second,eventhoughsomecomponentsmaynotfailimmediately, theirlifetimesmaybesignificantlyreduced duetooverheating. It is reported in [1–3]that thelifetime ofan electronicdevice decreasesexponentiallywiththeincreaseoftheoperating tem-perature.Finally,thegeneratedheatdissipationcanalsoleadto negativeenvironmentalimplications.Therefore,itisimportantfor eachservercomponenttorunatatemperaturebelowits overheat-ingthreshold.
However, in today’s data centers, how to precisely detect whetheranycomponentinaserverisoverheatingremainsanopen question.Thecurrentpracticeofdetectingandmonitoringan over-heatingservercanbedividedintotwocategories.Thefirstcategory isacoarse-grainedapproachthatonlyusesthetemperatureata proxycomponent,e.g.,CPU[4]oratafixedlocation,e.g.,theserver inlet,forserveroverheatingmonitoring.Thisisincontrasttothe factthatdifferentcomponentsinaservermayhavedifferent over-heatingthresholds,whicharecloselyrelated totheirrespective thermalfailureratesandexpectedlifetimes.Relyingona single 2210-5379/$–seefrontmatter© 2013 Elsevier Inc. All rights reserved.
thresholdattheserverinletorattheproxycomponentistherefore over-simplistic,becausethethermalcorrelationbetweentheinlet (ortheproxycomponent)andeachservercomponentcanbe dif-ferentforeveryservermodel.Asaresult,monitoringonlytheinlet temperatureoraproxycomponent,suchastheCPU,mayleadto eithermisseddetectionofoverheatingforthecomponentsother thanCPU,resultinginadegradedsystemlifetimeorfalsealarms thatresultinexcessivecoolingpowertounnecessarilylowerthe inlettemperature.
Thesecond category ofserverthermal monitoring approach assumesthateachdifferentcomponenthasitsownbuilt-inthermal sensor.Extensiveresearch[5–8]ofserverthermalmanagementhas recentlybeenconductedbasedonthisassumption.Unfortunately, today’shigh-densityseversarenotequippedwithathermal sen-soroneverycomponent.Inmostservers,onlytheprocessorshave on-diesensors whilesomememorychips mayalsohave built-insensors.Therefore,itisimportanttoprovideamechanismfor measuringthetemperaturesofothercomponents(e.g.,harddisk, networkchips),suchthatthepreviouslyproposedthermal man-agementschemescanworkeffectively.Moreimportantly,evenif everycomponenthasitsown thermalsensor,thosesensorsare usedonlyforthecontrolloopsofthosecomponentsinanisolated way.Asaresult,theycannotprovideasystem-levelthermalpicture thatcanhelpthefansystemoftheserverandthecoolingsystemsin thedatacentertoefficientlycooldownoverheatingcomponents. Furthermore,low-endsensorsusedin servercomponents com-monlyhavemeasurementnoisesandhardwarebiasesthatmay leadtofaileddetectionorfalsealarms.Recentstudies[9,10]have shownthatthecollaborativedatafusionofmultiplesensorscan significantlyimprovethedetectionaccuracy.Therefore,itis prefer-abletohaveserver-levelthermalmonitoringwithmultiplesensors thatcanpreciselydetectoverheatingcomponents.
Inthis paper,we proposetoleverage thethermal dynamics inaservertointelligentlyplacesensorsfor preciseoverheating servercomponentsdetection.Oursensorplacementsolution fea-turesamodel-basedapproach,whichadoptsComputationalFluid Dynamics(CFD)asatheoreticalfoundationtoestablishthe ther-malmodelandanalyzethethermalstatusoftheserverenclosure undervariousoverheatingconditions.CFDisapowerful mechani-calfluiddynamicanalysisapproachandiswidelyusedtoanalyze thefluiddynamicsinvariousengineeringfields,suchasaircraft enginedesignandthermalanalysisforbuildings.CFDhasalready beenusedbycomputersystempackagingdesignerstomake intel-ligentdecisionsonservercomponentlayoutdesign, butnotyet
forsensorplacementintheserverbox.WhileCFD-basedthermal monitoringhasshownpromise,akeylimitationofCFDisitshigh computationoverhead.Asaresult,CFDcannotbeeffectivelyused toreportthermalemergenciesinrealtime.Inthiswork,wepropose touseCFDtoanalyzethethermaldynamicsofflineandthen opti-mallyplacesensorsbasedontheanalysisresultstoconductonline overheatingdetection.Suchanintegratedapproachcanenableus toachievethebenefitsofboththesystematicmodelingofthermal dynamics(fromCFD),aswellasonlinemeasurementcalibration andfastresponsiveness(fromsensors).Oursolutionprovidesaway toequipexternalsensorsontheexistingserversdeployedindata centersformoreaccurateoverheatingmonitoring.Theproposed solutioncanalsobeusedonfutureserverstoplacemoresensors onthemotherboardduringthedesignphase.
Inourintegratedthermalmonitoringsolution,wefirstuseCFD tomodelthethermalenvironmentofagivenrackserverboxunder differentoverheatingconditions,includinginletoverheating,fan failureandCPUoverloading.Wethencalculatethemostcorrelated regionsintheserverboxforeachspecificcomponentby correla-tionanalysis.Accordingly,foragivennumberofsensors,weseek toplacethemintheserverboxsuchthattheoverheating com-ponentscanbedetectedwiththemaximumdetectionprobability,
whiletheerrorrateofthedetectioncanbebounded.We formu-latethisproblemasaconstrainedoptimizationproblem.Basedon theCFDanalysis,wedesignaheuristicalgorithmtofinda near-optimalsensorplacementsolution.Inouralgorithm,weapplydata fusiontechniquestoallowsensorstomake collaborative detec-tiondecisionsofservercomponentoverheating.Specifically,the contributionsofthispaperarefour-fold.
•Whilethecurrentthermalmonitoringsolutionsrelyon simplis-ticsensorplacement,i.e.,asinglesensorattheinletortheCPU, wepropose a novelsensor placement schemetointelligently placesensorsformaximizedoverheatingdetectionprobabilities ofeachservercomponentofinterest.
•WeuseCFDanalysisasatheoreticfoundationtodesignour pro-posedsensor placementscheme.OurCFDanalysismodelsthe thermaldynamicsof arackserverboxinvarious overheating scenarios,includinginletoverheating,CPUoverloading,andfan failure.
•Weformulateoptimalsensorplacementasaconstrained opti-mization problem and propose a heuristic algorithm to find a near-optimal solution. Temperature correlation analysis is conductedtofindthemostcorrelatedregions foreach server component.
•Weevaluateoursensorplacementschemeinareal-worldrack serverbox.Bothourempiricalandsimulation results demon-stratethatourplacementsolutioncansignificantlyimprovehot serverdetectionperformance.
Theremainderofthispaperisorganizedasfollows.Section2
highlightsthedistinctionofourworkbydiscussingrelatedwork.
Section3presentsthedatafusionmodel,theformulationofthe
serveroverheating detection problem, as wellas the tempera-turethreshold settingfor each differentcomponents. Section4
introducesthefundamentalsoftheComputationalFluid Dynam-ics approachand providesanexample of howtomodel a rack serverbox.Section5elaboratesonhowtousetheanalyticalresults fromCFDinoursensorplacementproblemandproposesa heuris-tic algorithm to solve theproblem. In Section 6, we introduce ourexperimentmethodologyandthenevaluateoursensor place-mentschemeusingbothsimulationandexperimentsonhardware testbed.InSection7,wediscussaninterestingvariantproblem for-mulation,aswellasapotentialapplicationofoursensorplacement scheme.Section8concludesthepaperanddiscussesthepossible futurework.
2. Relatedwork
Thermalmanagementforcomputersystemshasbeenwidely studiedinthepast.Skadronetal.haveproposeda temperature-awaremicroprocessormanagementtool,HotSpot[11],whichuses thermal resistancesandcapacitancestomodelthetemperature ofmicroprocessors.Performanceandthermal behaviorsof stor-agesystemsareextensivelystudiedin[12],whichidentifiesthe knobfortemperatureoptimizationofhighspeeddisks.Linetal.[8]
haveproposedasoftwarethermalmanagementschemeforDRAM Memory,which hasbeenimplementedonrealmachines. How-ever,fewstudieshavebeendoneonthejointthermalmonitoring andmanagementacrossdifferentsystemcomponents.Jeohwang etal. havemodeled thethermalprofilefor anoperating server systemandarackin[13]toprovideabridgebetweenthe indi-vidualcomponentthermalstatusanddatacenterthermalprofile. Ajointenergy,thermalandcoolingmanagementtechnique(JETC) isproposedin[14]tooptimizethecoolingandoperatingenergyfor bothCPUandmemory.Differentfromallthepreviousworkthat addressesasinglecomponentindividually,ourworkfocusesonthe
jointthermalmonitoringofmultiplecomponentsinasinglerack serversystem.
Data centerthermal management hasalsoattracteda lot of researchefforts.El-Sayedetal.[15]studiedhow tosafelyraise theoperatingtemperaturesetpoint ofdata centercooling sys-temsuchthatmorecoolingpowercanbesaved.Anautomated, online,predictivethermalmanagementschemefordatacenters isalsoproposedin[16].Workloadschedulingaccordingtodata centerthermalprofilehasbeenstudiedin[17].Anotherimportant aspectofdatacenterthermalmanagementnamelytemperature andthermal prediction,hasalsobeenstudied.Afastprediction frameworkfordatacentertransienttemperatureis proposedin
[18].Predictionsystembasedononlinethermalsensorreadings toreachafastandaccuratedatacentertemperatureprediction isproposedin [19]. Chenet al.[20] proposedtocombineboth onlinesensorreadingsandCFDanalysisresultsfordatacenter tem-peratureprediction.Althoughourworkpresentedin thispaper focusesonthethermal monitoringissuefor overheatingserver component,itisactuallycomplimentarytoalltheabovementioned datacenter-levelthermalmanagementstudies.Oneofthegoalsfor adatacenter-levelthermalmanagementsystemistorunthedata centercoolingsystemmoreefficiently.Ahigherriskof overheat-ingisoftenintroducedbysuchancoolingmanagementsystem. Withourthermalmonitoringsystematservercomponentlevel, overheatingissuecanbemoreefficientlymonitoredandcaptured. Sensorshavebeendeployedtoconductthermalmanagement incomputersystems.Theexistingthermalmanagementwith sen-sorscanbecategorizedintotwoclasses.Thefirstclassistodeploy sensorsinserverroomsand largedatacentersforenvironment temperaturemonitoring.Forexample,ahybridwiredandwireless sensornetworkisusedin[21]fordatacenterthermalmonitoring. Sensorsarealsousedin[9]todetecttheoverheatingserversatthe singlesystemlevel.Thesecondclassistodeploysensorsinsideor arounddifferentcomputercomponentsforaspecificcomponent thermal monitoring.Forexample, thecurrentCPU temperature thermalmanagementschemesdeployon-diethermalsensorsto monitortheCPUtemperatureatruntime[22].Temperaturesensor circuitshavealsobeenadoptedintheDRAMdesigntoprovide ther-malmonitoringformemorychips[23].Chiplevelthermalprofile isalsostudiedin[24]byusingruntimetemperaturesensor read-ings.Ourworkisdifferentfromalltheaforementionedresearch. WeuseComputationalFluidDynamics(CFD)andtemperature cor-relationofdifferentcomponentstoguidesensorplacement,such thattheefficiencyofthethermalemergencydetectioncanbe max-imized.
Differentsensordeploymentapproaches forimproved moni-toringanddetectionperformancehavealsobeenstudiedbefore. AsensorplacementschemebasedontheMultivariateGaussian Processmodelis proposedin [25].Thoughit provides informa-tivemonitoringresults,anofflinetrainingstagebeforetheactual deployment is required. This is not feasiblefor thermal moni-toringofproductionserversystemsbecausethermalemergency shouldnot becreatedfor thecollection ofthe training data.A fastsensorplacementapproachforfusion-basedtargetdetection isalsoproposedin[10]tominimizethenumberofdeployed sen-sorswhileachievingassureddetectionperformance.Differentfrom theaforementionedwork,weproposeanewmodel-basedsensor deploymentapproach,whichleveragesthetheoretical computa-tionalresultsfromCFDtomaximizethedetectionperformanceof servercomponentthermalemergency.
3. Overheatingservercomponentdetection
In this section, we first introduce the detection model for overheating server components. We then formulate overheat-ing server component detection as a constrained optimization
problem.Lastly,weintroducehowtosettheoverheating temper-aturethresholdforeachcomponent.
3.1. Overheatingcomponentdetectionmodel
Inthedesignofa computersystem,itisalwaysdesirableto optimizethe coolingefficiencyof thesystem.However, due to thedifferenceinfunctionalitiesandthevarianceinmanufacturing processes,eachcomponentinthesystemusuallyrequiresa differ-entsafeoperatingenvironmenttemperature.Therefore,inorder forthecomputersystemtooperatemoreefficiently andsafely, theoperatingenvironmenttemperatureofeachcomponentshould bemonitoredseparatelybasedontheirownrequirement.Ideally, individualthermalmonitoringandcoolingmechanismshouldbe providedtoeachsinglecomponent.Forexample,thecurrentdesign ofCPUincorporateson-diethermalsensors,suchthatthe temper-atureoftheCPUchipcanbemonitoredatruntime.Moreover,a heatsinkisusuallyattachedontopoftheCPUchiptoincrease theairflowrateoverCPU,suchthatthecoolingefficiencycanbe improved.Unfortunately, thereisusuallynosuchon-diesensor embeddedontoothercomponents,suchasmemorychipand net-workchip.Therefore,newtechniquesareneededtomonitorthe operatingenvironmentofallthecomponents,suchthattheir over-heatingconditionscanbedetectedandreportedpromptly.Inthis paper,weproposetoplaceadditionalsensorsintothecomputer systemboxtomonitortheoperatingenvironmenttemperaturesof allthecomponentsinthecomputersystem.
Withallthecomponentsandcoolingequipmentsrunning,the thermal environment inside acomputerbox is complex,which couldcausemorenoiseinthesensorreadings.Furthermore,the numberofsensorsthatcanbeplacedintoahigh-densityserver boxislimited,asonewantstomaximizethespaceutilizationfor allkindsofservercomponentsandavoidcomplexwiringandcostly installationinthealreadycompactserverbox.Thus,theadditional sensornodesaddedtotheserverboxshouldcollaboratewitheach othertomaximizetheirutility.Toaddressthesechallenges,we adoptdatafusion[26],awidelyadoptedcollaborativesensing tech-nique,tojointlyprocessnoisedatafrommultiplesensors.
Itisclearthattemperaturesatdistantlocationsfroma compo-nentarelesslikelytobecorrelatedwiththeambienttemperature ofthatcomponent.Therefore,wedefineafusionregionforeach monitoredcomponentasadiscwithafusionradiusR,whereeach monitoredcomponentislocatedatthecenterofthatdisc.The sen-sorswithinthefusion regionofamonitoredcomponentshould collaboratetomake theoverheatingdetectiondecision forthat component. Moreover,becauseof the complexair flows inside thesystem,temperaturesatdifferentlocationswithinthefusion regionhavedifferentcorrelationwiththeambienttemperature ofthemonitoredcomponent.Forexample,basedontheairflow direction,thetemperaturesatlocationsbehindtheCPUaremore correlatedwithCPUambienttemperature,comparedwiththe tem-peraturesatthelocationsinfrontoftheCPU.Therefore,wefurther defineacorrelationthresholdTh(i,j)foreachpairoflocationiand componentlocationj.Tocontributetotheambienttemperature monitoringforcomponentj,sensorsshouldbeplacedatlocation
iwithinthefusionradiusofcomponentj,wherethecorrelation valueshouldbelargerthanTh(i,j).
Todecidetheambienttemperatureatthemonitored compo-nentlocation,weadoptadatafusionschemewhichcalculatesthe averagetemperatureofallthereportedtemperaturesfrom sen-sorsthatmeet theabovetwo criteria.Wecomparetheaverage temperaturevaluewitha detectionthreshold j. Iftheaverage temperatureishigherthanthethreshold,thedecisionofa com-ponentbeingoperatinginanoverheatingenvironmentispositive. Theambienttemperature,Tj,ofcomponentjcanbederivedfrom thetemperaturereading,Ti,atthelocation(xi,yi)ofsensori.The
approachweusetoderivethetemperatureTjisexplainedinSection
5.2.Fornow,wejustdenotethisderivationasTj=fj(Ti). Measure-mentnoiseisusuallyincludedinthesensorreadings.Denotethe measurementnoisestrengthmeasuredbysensoriasNi,which fol-lowsthezero-meannormaldistributionwithavarianceof2,i.e., Ni∼N(0,2)[25].Weassumethatallthetemperaturesensorsare identical,suchthattheyfollowthesamemeasurement distribu-tion.Thefinalreportedtemperatureforthelocationofcomponent
jcanbepresentedas
Tj=f(Ti)+Ni2 (1)
whereN2
iisthenoiseinenergyform.Thenoiseistakenoutfromthe transformationsinceitisadditivetotherealtemperaturereadings. Assumingtherearenjsensorswithinthedatafusiongroupofa componentatlocationj,thedetectionprobabilityofthe overheat-ingcomponentjinaspecificoverheatingscenariocanbecalculated as PDj=P
1 nj nj i=1 fj(Ti)+N2i >j (2) wherej isthedetectionthresholdofoverheatingforthe com-ponentatlocationj.Becauseofthemeasurementnoisefromthe sensordevice,jincludesboththerealtemperaturethresholdfora component,denotedasCj,andthemeasurementnoise.Withahigh noiselevelfromthemeasurement,adetectionsystemislikelyto reportafalsealarmwhenthereisnorealevent.Inourcase,we definethefalsealarmratewhentheenvironmentofthemonitored componentisactuallynotoverheatingasfollowsPFj=P
1 nj nj i=1 Ni2+Cj >j (3) We assume Gaussian Noise, i.e., Ni/∼N(0,1). Therefore, nji=1(Ni/)2followstheChi-squaredistributionwithnjdegreesof freedom,denotedasnj(·).Hence,Eqs.(2)and(3)canbemodified
asfollows: PDj=1−nj
njj− nj i=1fj(Ti) 2 (4) PFj=1−nj nj(j−Cj) 2 (5) 3.2. ProblemformulationWeassumethatthereareMcomponentsinacomputerserver, whose operating ambient temperatures need to bemonitored. GivenNsensors,(N≤M),weneedtofindtheplacementoftheseN
sensorssuchthatwecandetecttheoverheatingemergencyatany oftheMlocationswiththehighestpossibleconfidence.Weassume
N≤Misbecauseitispreferabletoplaceasfewsensorsaspossible intheserverboxforthermalmonitoringpurpose,consideringthe complexityandhighcostofthewiringdesignonthemotherboard. Ourgoalistomaximizetheaveragedetectionprobabilityofallthe monitoredlocations max 1 M M
j=1 PDj (6)subjecttothefollowingconstraint
PFj≤˛
∀
1≤j≤M (7)where˛isthetolerabledetectionfalsealarmratebound.Wenote thatthe false alarm rateneedstobe boundedin many practi-calscenarios inorder toreducethewasteofsystemresources.
For acertainsensor placement,PFj ≤˛is anecessary condition
inourproblem.ByEq.(5),weconverttheconstraintinEq.(7)to j≥(2−nj1(1−˛)/nj)+Cj,aconstraintforthedetectionthreshold jatmonitoredlocationj,where−1(·)istheinversefunctionof (·).Usingthisequation,wecanobtainthethresholdthatsatisfies thefalsealarmrateboundwhilemaximizingthedetection proba-bility.FromEq.(4)weknowthatPDj decreaseswhenjincreases.
Therefore,tomaximizethedetectionprobability,weremovethe inequalityintheconstraintandonlyusethelowerbound˛.Hence, jcanbecalculatedas j= 2−1 nj (1−˛) nj + Cj (8)
3.3. Componenttemperaturethreshold
Before solving the problem in Section 3.2, we need to set the overheatingthreshold for each components in the system. Amongallthefactorsthatcontributetothelifetimeof semicon-ductordevices, operatingjunctiontemperature,i.e.,thehighest temperatureinsidethesemiconductordevice,isacritical decid-ingfactor.Withahigherjunctiontemperature,devicestendtofail sooner.Therehasbeenresearch[11,1]studyingthe temperature-inducedfailuremechanismsofsemiconductordevices.Inmostof themodelsstudied,theoperatingjunctiontemperatureshowsan exponentialimpactonthefailurerateofadevice,whichis:
∝exp
−EakTJ
(9) wherekis theBoltzmann’sconstant,8.6eV/K.Eaand TJ arethe activationenergyofelectromigrationandtheoperatingjunction temperature,respectively.ThecommonactivationenergyforAl
andAlwithsiliconis0.6eV.
Hardwarecomponentsfrommanufacturersoftencomewitha warrantytime.Forexample,bothIntelandAMDselltheir prod-uctswithathree-yearwarrantypackage.Notethatthiswarranty timeindicatesthetimeperiodthatthedeviceshouldworkproperly withouthardintrinsicfailures,evenrunningunderextreme con-ditionswithinthespecification.However,asacommonpractice, computersystemsusuallyserveforalongerperiodoftimethan threeyearswithupgradestosomecomponents,suchasadding newdisksforlargerstoragespace.Toextendtheworkingtime, weneedtolowertheoperatingambienttemperaturethresholdof eachcomponent.Giventheextendedlifetimerequirementtand thelifetimerequirementtunderwarranty,wecanuseEq.(9)to calculatethenewoperatingjunctiontemperaturethresholdTJas
1 TJ = k Ea ln
t t + 1 TJ (10) In thiswork, weusesensorstomonitor thetemperatureof theoperatingenvironment,whichistheambienttemperatureofa workingcomponent.TheambienttemperatureTAcanbecalculated usingjunctiontemperatureTjinEq.(9)asTA=TJ−P×JA (11)
wherePistheoperatingpowerofthedeviceandJAisthe junction-to-ambientthermalresistance[27].
Basedonalltheabovederivationsandrelatedvaluesfromdata sheetsofdifferentcomponents,wesettheoperatingenvironment temperaturethresholdCjforcomponentjinourworkbyoneofthe followingthreemethods:(1)directlytakenfromthedatasheet.For someofthecomponentsinthecomputersystem,themaximum operatingenvironmenttemperatureislistedinthedatasheetor themanual.Fig.1istheplatformusedinourexperiment.Itisa
Fig.1.TheDELLPowerEdge29502Urackserverusedinourhardwaretestbed. Theyellowboxesarethechipswhoseoperatingenvironmenttemperaturesneed tobemonitored.Thereddashedboxinthelowerpicturehighlightsthefrontpanel assemblyoftheserver.Thereddashedboxintheupperpicturehighlightsthe tem-peraturesensorusedbytheDELLservertomonitorthetemperatureattheinlet. ExceptCPUandMemory,chipsneedtobemonitoredfortemperatureareindexed andhighlightedwithyellowboxes.(Forinterpretationofthereferencestocolorin thisfigurecaption,thereaderisreferredtothewebversionofthearticle.)
2UDELLrackserverequippedwithanAMDOpteron2222SE Dual-Coreprocessor.Themaximumoperatingtemperaturelistedonthe datasheetforthistypeofCPUis69◦C.(2)Convertedfromthe junc-tiontemperaturethreshold.Forexample,themaximumjunction temperatureandthejunction-to-ambientthermal resistancefor LatticeispMACHCPLDchipinoursystemare75◦Cand41.8◦C/W, respectively.ApplyingEqs.(10)and(11)withlifetimerequirement of7years,wecangettheambientthresholdas60◦C.(3)Forthe unknowntypeofchipsorthechipswhosedatasheetsarenot avail-able,weuse43◦C,thedefaultSystemBoardAmbientTemperature settingrequiredbyOpenManage,DELL’sservermanagementtool.
3.4. Overheatingdetectionprobabilitymaximizationfor combinedoverheatingscenarios
InSection 3.2,wehave formulatedthedetectionprobability
maximizationproblemunderaspecificoverheatingscenariosuch asaninletoverheatingorCPUoverloading.However,inpractice, thereareusuallynomeanstoknowwhatkindofoverheating sce-narioisgoingtohappenatafuturetime.Therefore,itisimportant topreparethesystemformultiplepossibleoverheatingscenarios. Onesimplisticwaytoachievethisgoalistoconsiderevery pos-sibleoverheatingscenarioonebyoneanddeploysensorsforevery scenario.Althoughthisapproachdoesnotrequirechangetoour previousdetectionmodel,itcanresultinalargenumberofsensors ifthenumberofoverheatingscenariosislarge.Thiskindof moni-toringsystemisnotdesirablebecauseofthespaceintheserverbox todeployadditionalsensorsislimited.Tomitigatethisproblem, weproposetomaximizetheaveragedetectionprobabilityacross multipledifferentoverheatingscenarios.
Assume we have K possible overheating scenarios, to get the average detection probability across multiple overheating
scenarios,theoverheatingdetectionprobabilitymodelinEq.(2)
needstobemodifiedas: PDj= 1 K
K k=1 P 1 nj nj i=1 fjk(Ti)+Ni2>j (12) wherefkj(·)isthetemperaturemappingfromsensorlocationito componentlocationjintheoverheatingscenariok.
SimilartoEq.(4),undertheGaussiannoiseassumption,wecan transformEq.(12)tothefollowingequation:
PDj=1− 1 K K
k=1 nj njj− nj i=1fjk(Ti) 2 (13) Basedontheaboveoverheatingdetectionprobabilitymodelfor theoverheatingcomponentdetectionundermultipleoverheating scenarios,wecanformulatetheprobabilitymaximizationproblem as: max 1 M M j=1 PDj (14)subjecttothesameconstraintasshowninEq.(7).Asshownin theexperimentalresultspresentedinSection6.5,thisformulation leadstoasmallernumberofsensorstobeplacedinaserverwith thedesiredoverheatingdetectionprobability.
4. CFDmodelingforserverboxandcomponents
Inthissection,wefirstintroduceComputationalFluidDynamics (CFD),thetoolweusetoanalyzethethermalenvironmentinside theserverbox.Wethenprovideanexampletodemonstratehow tomodelaserverboxandeachofitscomponentsinpracticeusing Fluent[28],awidelyusedCFDmodelingsoftwarepackage.
4.1. CFDmodeling
CFDisafluidmechanicsapproachthatanalyzespropertiesof fluidflowsbasedonnumericalmethodsandalgorithms.CFD anal-ysisgives greatinsightintotheflowpattern anddistributionof a targeted environment. Comparedwith thetraditional experi-mentalmethodofstudyingtheflowpatterndistributionsuchas usingflowsensors,CFDhasitssignificantadvantages.First,CFD canreachahighresolutioninthespaceandtimedomainswhile thetraditional methodusuallycanonlystudyalimitednumber ofpointsandtimeinstants.Second,CFDcanbeappliedfor virtu-allyanyproblemusingrealisticoperatingconditionsetupswhile experimentalmethodologycanonlyworkonlimitedconditions andenvironments.Third,thescaleofCFDsimulationcancovera widerangewhilethetraditionalmethodusuallyonlyworksona laboratory-scalemodel.
ThekeyforCFDmodelingistosolvethegoverningtransport equationsrepresentedinthefollowingconservationlawform:
∂
∂
t +∂
Uj∂
xj =∂
∂
xj ,eff∂
∂
xj +S (15)where representsdifferentparameterssuchasmass,velocity, temperatureorturbulenceproperties; isthefluid(air)density;
tisthetimefortransientsimulations;xjisthecoordinatevariable forx,yorzwithjbeing1,2or3;Ujisthevelocityindifferent directions;isthediffusioncoefficient;andSisthesourceforthe particularvariable.Forexample,whenistheairtemperature,S
standsforthevolumetricheatratefromasourcecomponent.The fourequationtermsrepresenttransient,convection,diffusion,and sourcepartsoftransportphenomenoninthespatialdomain[29].
ThepartialdifferentialequationslistedinEq.(15)representa system,whereallthetransportequationsarecoupledtogetherand requiretobesolvedsimultaneously.Fora complicated environ-ment,suchasaserverenclosure,closed-formsolutionsarehard tobefoundfortheairflowandheattransferoftheentiresystem. Therefore,themostfundamentalconsiderationinCFDishowto treatacontinuousfluidinadiscretizedfashion,suchthat numeri-calmethodscanbeappliedtofindthesolutions.MostCFDsoftware packagesapplythecontrolvolumemethodtofindnumerical solu-tions.
4.2. ExampleofserverboxCFDmodeling
UsingCFDtoperformacontinuousfluidmodelrequiresthe dis-cretizationofthespatialdomainintosmallcells.Onemethodto performthis discretizationis togeneratevolumetric grid.After the discretization, necessary boundary conditions and suitable algorithmsneedtobeappliedtosolvetheabove-mentioned trans-portequations.Severalpopularsoftwarepackages,suchasFluent, FLOTHERM,FloventandPhoenics,canbeusedforCFDmodeling purpose.Inourproject,weuseFluent,awidelyusedCFDsoftware packagefromANSYSInc.,toperformthegeometrymeshingand solutionfinding.
TheCFDmodelweestablishinthisexampleisfortheDELL Pow-erEdge2950serverbox,showninFig.1.Inthefirststep,weuse Gambit,whichisagridgenerator,toperformthegeometry estab-lishmentforthisserver.Basically,wechoosedifferentgeometric shapesandperformunificationorsplittoestablishthe geomet-ricmodelfortheentireserverbasedontherealmeasuredscales. Thenweadddifferentgeometricshapesintotheserverbox geom-etrytomodeltheservercomponents,suchasthesystemfanand CPUsink,accordingtotheirgeographiclocationandcorresponding scale.Afterallcomponentsareaddedintothegeometricmodel, weneedtospecifydifferentboundarytypes,suchastheserver walls,thefans,andtheinlets/outletsoftheserverbox.Thelast stepistodividetheentiregeometricmodelintosmallerscalecells byapplyinggeometrymeshinginGambit.Thegridsizeisa user-specificparameter.Withafinergrid,moreaccurateCFDmodeling canbereached.However,afinegridincreasesthecomputational burdeninthefollowingstagewhenthetransportequationsare solvedbynumericalmethods.Weuse1mmasthegridsizetomesh thegeometry.AlthoughtheCFDgeometrymodeltakessometime togeneratebecauseofthecomplicatedcomponentlayoutinthe serverbox,wenotethatitisaone-timeworkthatcanbeusedfor theanalysisonalldifferentoverheatingconditionsforthesame server,whichisfeasibleforanofflinesensorplacementapproach. AftermeshingtheentireserverinGambit,weexportthegridto thesecondsoftwarepackage,Fluent,tosolvethetransport equa-tionsinEq.(15).Fluentrequiresalltheboundaryconditionsofour geometricmodeltobespecified.Forexample,weneedtospecify thepowerdissipationofeachheatdissipatingcomponentssuchas CPU,memory,diskandalltheothersystemchips.Wealsoneed tospecifytheinlettemperatureandthesystemfanspeed.After alltheparametersaresetup,thestandardk-epsilontwo-equation turbulencemodelischosentosimulatetheturbulentflow.Each simulationofonerunningconditiontakesabout20mintofinish.
Fig.2showsacoloredcross-sectiontemperaturemapaftersolving thetransportequationsinFluent.Thisisascenarioinwhichallthe componentsarerunningunderthepowersettingspecifiedontheir datasheets.
5. CFD-guidedsensorplacement
Inthissection,weintroducehowtousetheresultsfromtheCFD analysistoguidesensorplacementinsidetheserverbox,withthe
Fig.2.Coloredtemperaturemap(◦C)oftheDELLserverrunningCPUintensive benchmarks.Thesmallblackboxesindicateallthechipswhosetemperaturesneed tobemonitored.ThelargeboxinthemiddleistheCPUsink.Thefourverticalshort linesinthemiddlerepresentthefoursystemfans.Thefourhorizontalthinblocks underneaththeCPUsinkrepresentthememorymodules.Thetemperatureofthe memoryclosesttotheCPUsinkisalsorequiredtobemonitored.Diskisontheleft sideofthegraph.
goalofmaximizingtheoverheatingdetectionprobabilityforallthe components.Wethenintroduceaheuristicalgorithmforsolving thisdetectionprobabilitymaximizationproblem.
5.1. Overviewofourapproach
UsingCFDtoolsforoursensorplacementintheserverbox pri-marilyinvolvestwosteps.Inthefirststep,weestablishageometric modelfortheserverboxinGambit,meshthegeometry,andexport thegridtoFluent.Wethentakemeasurementsfortheincomingair temperatureandairflowrateattheinletoftheserver.These mea-surements,alongwiththepowerconsumptionofeachcomponent andthefanspeed,aretheinputparameterstoFluent.Werepeat thefirststepbytuningtheactuatingparameterofthe overheat-ingscenariostogetmultipleresultsofCFDanalysis.Forexample, inanoverheatingscenariocausedbyinletoverheating,wechange theinlettemperaturetoseveraldifferentvaluestorunCFD analy-sis.BasedontheCFDresultswithdifferentinlettemperatures,we obtainthetemperaturecorrelationbetweenanyspatiallocation, definedbytheCFDgrid,andeachcomponentlocation.Wealsouse theCFDdatatoobtainanapproximationfunctionforeach spa-tiallocationandtargetedcomponentlocationpair,suchthatthe temperatureatthetargetedlocationcanbecalculatedfromthe temperatureatanyspatiallocationwithahighcorrelation.
Inthesecondstep,wefeedtheresultsfromtheCFDanalysis, includingtheoverheatingscenariotemperaturedataandthe corre-lationdatatoouroptimizationalgorithmtofindthebestlocations forsensorplacement.Weassumethatoursensorplacementneeds tomonitorthetemperatureofthepointabovethecenterofeach component’stopface.Tosolvetheplacementproblemefficiently, we develop ouralgorithm based onthe ConstrainedSimulated Annealingapproach[30].Thealgorithmisexplainedindetailin thefollowingsections.
5.2. Componentambienttemperaturefunctionandcorrelation
InSection3.1,wedenotethereportedtemperatureof
compo-nentatlocationjfromsensoribyarelationshipTj=fj(Ti).Becauseof thecomplexfluiddynamicsandthermaldistributionintheserver box,thetemperatureatlocationicanbeverydifferentfromthe temperatureatlocationj,evenifthephysicaldistancebetweenthe twolocationsisshort.Therefore,weneedafunctionmappingfrom
icanbeusedtoreportthecomponenttemperatureTj.Weusethe CFDanalysisresultsfromthelastsectiontoderivethisrelationship mapping.WefirstrepeattheCFDanalysiswithdifferent parame-tersettings.Forexample,intheinletoverheatingscenario,theinlet temperatureischangedatdifferentrunsoftheCFDanalysis.Based onallthetemperaturedatafromdifferentrunsofCFD,weestablish asecond-orderpolynomialmodeltoapproximatetherelationship betweenanytemperatureTiandthecomponenttemperatureTjas:
Tj=aj,iTi2+bj,iTi+cj,i (16)
WehavealsointroducedinSection3.1thatoursensor place-mentschemeonlyplacessensorsatthelocationsthathavehigh temperaturecorrelationstothemonitoredtargets.Therefore,we use the same set of CFD data as used in the above function approximation to calculatethe spatial correlation betweenthe temperaturesTiandcomponenttemperatureTj.Person’s correla-tionisawidelyadoptedmetric[31]thatcalculatesthedegreeof associationbetweentwovariables.Assumingthatwehavensetsof CFDdatawithdifferentinlettemperaturesettings,wecancalculate Person’scorrelationr(Ti,Tj)by r(Ti,Tj)= nk=1(Tik−Ti)(Tjm−Tj)
n k=1(T k i −Ti) 2n k=1(T k j −Tj) 2 (17)Thepolynomialfunctionapproximationandcorrelationvalues areallinputstothealgorithminthenextsection.
5.3. Sensorplacementalgorithm
Procedure1. CFD-guidedsensorplacement(D)
Input:SensornumberN,ComponentLocationlistx[K]andy[K],CFDdata,
Correlationdatardata,OverheatingThresholdListC[K] Output:PlacementsolutionD
1.forj=1toKdo
2. x[j]min=xj−R;x[j]max=xj+R
3. y[j]min=yj−R;y[j]max=yj+R
4.endfor
5.x
min=min(x[K]);xmax =max(x[K]); 6.y
min=min(y[K]);ymax=max(y[K]); 7.(P,D)
8.=CSA(N,x
min,xmax,ymin,ymax ,C[K],CFDdata,rdata) 9.returnD
Ourgoalistofindtheoptimalsensorplacementlocationsinthe serverboxtomaximizetheaverageoverheatingprobabilityforall themonitoredcomponentlocations.Weproposetouseanonlinear programmingsolverbasedontheConstrainedSimulated Anneal-ing(CSA)algorithm[30].CSAisanextensionoftheconventional SimulatedAnnealingalgorithmforsolvingtheglobalconstrained optimizationproblemwithdiscretevariables. Theoretically,CSA canreachaglobaloptimalsolutionbyconvergingasymptotically toa constrained globaloptimum witha probabilityof 1. How-ever,alimitationofCSAisthatitscomputationalcomplexitygrows exponentiallywithrespecttothenumberofvariablesandthe solu-tionsearchspace[30,10].Therefore,beforeweapplyCSA,wefirst reducethesearchspaceofthealgorithmbycalculatingthe plau-siblesearchspaceaccordingtothecomponentlocations.In our sensorplacementproblem,weproposetoutilizesensorsthatare withinthefusionrangeofacomponentlocationtocollaboratively decideiftheoperatingenvironmenttemperatureofthat compo-nentisoverheating.Therefore,thesearchspaceisonlyplausible forthatcomponentifthesensorisplacedinsidethefusionrange
Rofthatcomponent.Weaggregatealltheplausiblesearchspaces ofeachcomponenttogetherbyfindingthemaximumand mini-mumpossiblexandyvaluesofasensor.Theaggregatedregion isthenusedasthesearchspaceforthesensorplacement algo-rithm.ThepseudocodeofthisalgorithmislistedinAlgorithm1.
Fig.3. Comparisonatmultiplelocationsintheseverbetweentemperature mea-surementsonthetestbedandCFDsimulationresults.TestbedrunsthesameCPU intensiveworkloadasinFig.2.
Lines1–6calculatetheplausiblesolutionsearchregion.Basedon theCFDandcorrelationanalysis,i.e.,CFDdata andrdata,lines7–8 useCSAsolvertofindtheplacementsolutionDthatmaximizesthe detectionprobabilityP.Algorithmoutputstheplacementsolution
D.
6. Evaluation
Inthissection,wefirstvalidateourCFDmodelbycomparing theCFDanalysisresultwiththerealsensormeasurements.Then weintroducetheexperimentsetupandthemethodologyusedfor theperformanceevaluationonourhardwaretestbed.Afterthat,the overheatingcomponentdetectionperformanceisevaluatedinboth simulationandhardwaretestbedexperimentsin threedifferent individualoverheatingscenarios,includinginletoverheating,fan failure,CPUoverloadingandthecombinedoverheatingscenario usingthepreviousthreeindividualscenarios.
6.1. Modelvalidationandexperimentmethodology
TovalidateourservermodelintheCFDanalysis,weplace19 sensorsinto theserverbox.The serverisplaced in anisolated serverroomwithadedicatedairconditioningsystem.We mea-surethetemperatureunderanormalserverrunningcondition,in whichtheserverisrunningtheSPECCPU2006benchmarksatan averagetemperatureof19.6◦Cattheinlet,witha0.5◦C fluctua-tionbecauseoftheairconditioningactuation.Themeasurements aretakenwhen theserverisrunningunderstablethermal sta-tuswithsensorsplaced intheclosedenclosure.Thesensorswe usedfortherealtemperaturemeasurementaretheTelosbsensor motes[32].Wechoosethistypeofsensorsbecausewecancollect thetemperaturereadingsfromthosesensorswithwirelesssignal withoutopeningtheserverenclosure.Wenotethatourapproach doesnotdependonaparticularsensortypeandcanutilizeeither wiredorwirelesscommunications(thoughwirelesssensorscan belessintrusivetothealreadycomplicatedserverenvironment).
Fig.3showsthecomparisonbetweentheCFDanalysistemperature resultandthetestbedmeasurementresult.Wecanseethatthe temperaturedifferencebetweenCFDanalysisandreal measure-mentisabout6.3%onaverage,whichshowsthatourcomputational CFDresultissufficientlyclosetotherealtemperature measure-ments.Ifadifferenttypeofsensorsthatissmallerinsizeisused, thedifferencecanbefurtherreduced.
There are totally five different sensor placement strategies that weevaluateacrossalltheexperiments.CFD-guided sensor placementistheplacementapproachweproposeinthisworkto placesensorsbasedontheanalytical resultsfromCFDanalysis.
ChipBestistheplacementresultingfromabesteffortapproach. Togetthisbestperformance,wefirstplacesensorsatalltheexact
Fig.4.Servertemperaturemapofapartialinletoverheatingscenario.Thered dashedboxesarethechipswhoseenvironmenttemperaturesexceedtheir indi-vidualoverheatingthresholds.TrianglesindicatethesensorsplacedbyCFD-guided approach,whenthegivensensornumberisfour.Theblackcrossesindicatethefour sensorsplacedbythebaselineChipBestapproach.(Forinterpretationofthe refer-encestocolorinthisfigurecaption,thereaderisreferredtothewebversionofthe article.)
chiplocationsintheoverheatingexperiment,oneforeachchip, tocollectthetemperaturedata.Then, for agiven number ofN
sensors(lessthanthenumberofMchips),wefindthecombination withtheNlocationsthatresultsinthebestdetectionperformance fromallpossiblecombinations.Noteit isinfeasible touseChip Bestinarealimplementation,becauseitneedstotestalldifferent combinations of sensor/chip pairing and select the best one. DifferentfromChipBest,ChipAveragecalculatesaveragedetection performanceofallthepossiblecombinations.Randomisasimple heuristicstrategythatplacessensorrandomlyintheserverbox, whichistheaverageresultsfrom10runsofrandomplacements.
UniformGriddividestheserverboxintouniform-sizedgridand placesonesensorineachgridrandomly.
Inallofourexperiments,weevaluatetheaveragedetection probabilityandtheerrorratefordifferentplacementapproaches. Theaveragedetectionprobabilityisdefinedasthenumberof over-heatingchips that aredetecteddividedby thetotal number of overheatingchips.Theerrorrateevaluatedconsistsofboththefalse alarmandmis-detection.Forallofourtestbedresults,weruneach overheatingexperiment10timesandcalculatetheaveragevalue ofeachperformancemetric.Therearenoaverageresultsin simu-lation,sincethereisnovariationinCFDtemperatureresults,when theexperimentsettingsremainthesame.
6.2. Inletoverheatingdetection
Inthissubsection,weevaluatethedetectionperformanceunder apartialinletoverheatingcondition.Partialinletoverheatingis oftenhardtobecapturedbythesingleinlettemperaturesensor onthefront panelassembly inFig. 1.Ideally, one couldadjust theairconditioningsystemintheroom(e.g.,reducingits blow-ingrange)toemulateinletoverheatingcausedbycoolingsystems. However,duetolimitedallowedaccesstotheairconditioning sys-temintheroom,weuseahairdryertoblowwarmairintothe serveratthelowerleftcornerofthefrontinlettoemulatethe par-tialinletoverheatinginourtestbedexperiment.Tocalculatethe spatialtemperaturecorrelationandthetargettemperature func-tion,CFDanalysisisconductedindifferentscenarioswithdifferent inletoverheatingtemperatures.Asaresult,thesensorplacement solutioncomputedbyouralgorithmcanhandlethedynamicsin differentinletoverheatingscenarios,despitethatweonlytesta subsetofthosescenarios.Fig.4showsthetemperature distribu-tionoftheserverboxunderthehighestpartialinletoverheating
20 40 60 80 100 Av erage e ction Probabilit y (% ) CFD Chip Best Chip Average Uniform Grid Random 0 1 2 3 4 5 6 7 8 9 10 11 Det e Sensor Number
Fig.5. AveragedetectionprobabilityoftheproposedCFD-guidedsolutionandthe baselinesintheproposedCFD-guidedsolutionandthebaselinesintheinlet over-heatingcase(simulation).
temperature.Wecanseethat9chips(reddashedframesinthe figure)outofthetotal11monitoredchipsareoverheatinginthis scenario.
Fig.5showstheaveragedetectionprobabilityinthepartialinlet overheatingscenario.WeseethattheCFD-guidedapproach has thehighestoverheatingdetectionprobability.ComparedwithChip Best,CFDshowsamaximumperformanceadvantageofabout22% whenthesensornumberis2.Thisismainlybecausewhena sen-sorisplacedattheexactlocationofonechipbyChipBest,itcannot alwaysprovidetemperaturemonitoringforotherchips,aschipsare usuallynotplacedclosetoeachother.AlthoughChipBestmayshow someacceptableoverheatingcomponentdetectionperformance whenthenumber ofsensors islarge, thisperformance is actu-allyhardtoachievewithouttestingallthecombinationsofsensor locationswiththegivennumberofsensors.Withoutexhaustively testingallthecombinations,onecanchoosechiplocations ran-domly,leadingtothedetectionperformanceoftheChipAverage
scheme.WeseethattheCFD-guidedplacementoutperformsthe
ChipAverageatallsensornumbersintheexperiment,witha high-estperformancegainof45%whensensornumberis2.Theother twobaselines,RandomandUniformGrid,showsignificantlyworse performancethanCFD-guided,ChipBest,andChipAveragesincethey areonlyheuristicapproaches.Toillustratethedifferencebetween
CFD-guidedandChipBest,aplacementexamplewith4sensorsis giveninFig.4.WeseethatCFDplacementdoesnotplacesensors onanyofthechips.Instead,itplacessensorsin betweenchips, suchthateachsensorcancovermorechips,thusleadingtobetter detectionresults.Fig.6showstheaverageerrorrateinthis sce-nario.WeseethatCFD-guidedplacementshowssignificantlylower errorratesthantheothertwochip-locationplacementschemes.
Fig.6.AveragedetectionerrorrateoftheproposedCFD-guidedsolutionandthe baselinesintheinletoverheatingcase(simulation).
Fig.7. AveragedetectionprobabilityoftheproposedCFD-guidedsolutionandthe baselinesintheinletoverheatingcase(testbed).
ThisdemonstratesthatwiththeanalyticalresultsfromCFD anal-ysis,theplacement cancover moretargets,whichleadstoless miss-detection.
Figs.7and8showthedetectionprobabilityanderrorrateof
detectiononthehardwaretestbed.Weextractthesensor place-mentlocationsfromthesimulationsandplaceallthesensorsinto theserverboxaccordingly.Becauseofthelimitedspace,weonly placeuptofivesensorsintotheserverbox.Sinceweevaluatethree differentsensorplacementschemes,themaximumnumberof sen-sorsplacedintheserveratthesametimeis15.Fromtheresult weseethatthedetectionprobabilityanddetectionerror perfor-manceonthehardwaretestbedmatchesthesimulation results well.Amongall thethreeschemes, CFD-guidedshows thebest detection performance and ChipAverage has the worst perfor-mance.
6.3. Fanfailuredetection
Inthisexperiment,weconductbothsimulationandhardware testbedexperimentonafanfailurescenario.Toensurethesafe operationofthesystem,weonlydisableonesinglefaninthe sys-tem.Tocalculatethespatialtemperaturecorrelationandthetarget temperaturefunction,severalrunsofCFDanalysiswithdifferent fanspeedsareconducted.Similartotheinletoverheatingscenario discussedbefore, oursensor placementsolutioncanhandlethe dynamicsindifferentfanfailurescenarios,becausetheCFD analy-sisisconductedwithdifferentfanspeeds.Fig.9showsthecolored temperaturemapoftheserverwithasinglefandisabled.The miss-inglineatoneofthefanpositionsrepresentsthefailedfan.Wesee that4chips(markedinreadframe)outofthetotal11monitored chipsareoperatingintheoverheatingenvironment.
Theaverageoverheatingdetectionprobabilityfromsimulation is shownin Fig.10. We seethatCFD placementapproach only requirestwosensorstoreacha100%ofoverheatingcomponent
Fig.8.AveragedetectionerrorrateoftheproposedCFD-guidedsolutionandthe baselinesintheinletoverheatingcase(testbed).
Fig.9. Servertemperaturemapinascenariowithsinglefanfailure.Thereddashed framearethechipswhoseenvironmenttemperaturesexceedtheirindividual oper-atingtemperaturethresholds.Theblacksolidtrianglesindicatethesensorsplaced bytheproposedCFD-guidedapproach,whenthegivensensornumberistwo.The blackcrossesindicatethetwosensorsplacedbythebaselineChipBestapproach.(For interpretationofthereferencestocolorinthisfigurecaption,thereaderisreferred tothewebversionofthearticle.)
Fig.10.AveragedetectionprobabilityoftheproposedCFD-guidedsolutionandthe baselinesinthescenariowithsinglefanfailure(simulation).
detection for allthe fouroverheatinglocations while ChipBest
requiresthreesensors.Theplacementswithtwosensorsbythese twoapproachesaremarkedinFig.9.WeseethatCFDplacement triestocoveralltherightcorneroverheatingchipsbyputtingonly onesensorinmiddleofthechips.ComparedwithChipAverage, CFDshowssignificantlybetterperformancebya60%higher detec-tionprobability.Asexpected,UniformGridandRandomschemes performmuch worsethantheotherplacementschemes.Fig.11
showstheaverageerrorrateofthefanfailurescenarioin simula-tions.Weseethatdespitesomerandomerrors,CFDoutperforms
Fig.11.AverageerrorrateoftheproposedCFD-guidedsolutionandthebaselines inthescenariowithsinglefanfailure(simulation).
Fig.12.AveragedetectionprobabilityoftheproposedCFD-guidedsolutionandthe baselinesinthescenariowithsinglefanfailure(testbed).
theothertwobaselineapproaches.ChipAverageshowstheworst performanceamongthethreeapproaches.
Figs.12 and13showthedetectionprobabilityanddetection
errorrateonthehardwaretestbedbasedontheextractedsensor placementlocationsfromthesimulation.FromFig.12weseethat
CFDhassimilarperformancewithChipBest,butbothofthemstill outperformtheChipAverageschemesignificantly.Fig.13shows theaverageerrorrateinthisfanfailurecase.WeseethatCFD per-formsjustalittleworsethanChipBest,butstillperformsmuch betterthantheChipAverage.Thedegradedperformanceinthisfan failurescenarioismostlikelycausedbythemodelinaccuracyof theCFDanalysis.Disablingafanmakesthethermalfluiddynamics morecomplexthanotherscenarios,leadingtoanincreaseofthe modelingerror.PleasenoteagainthatChipBestisactuallynot fea-sibleinarealimplementation,becauseitneedstotestalldifferent combinationsofsensor/chippairingandselectthebestone.
6.4. CPUoverloadingdetection
Inthissection,wepresentthesimulationresultsfor overheat-ingscenarioinducedbyCPUoverloading.Withthewidelyadopted DVFStechnique,CPUpoweriswellknowntobeacubicfunction ofCPUfrequency[33].ByoverclockingCPUfrequencyto1.5×of themaximumvaluelistedondatasheet,3× overloadedpower consumptioncanbeeasilyreached. Unfortunately,theplatform weuseinourhardwareexperimentdoesnotsupportCPU over-clocking.Therefore,weonlyshowthesimulationresultsin this sectionforthedetectionperformanceunderCPU3×overloading. Tocalculatethespatialtemperaturecorrelationandthetarget tem-peraturefunction,severalrunsofCFDanalysiswithdifferentCPU powersettingsareconducted.Noteagainthatoursensor place-mentsolutionisdesignedtohandlethedynamicsindifferentCPU overloadingscenarios.
Fig.14showsthecoloredtemperaturemapfortheCPU over-loading3× powerscenario.Althoughthe colorpattern isquite
Fig.13. AverageerrorrateoftheproposedCFD-guidedsolutionandthebaselines inthescenariowithsinglefanfailure(testbed).
Fig.14.ServertemperaturemapinthescenarioofCPUoverloading3xthelisted powerconsumptiononthedatasheet.Thereddashedboxesarethechipswhose environmenttemperatureexceedstheirindividualoperatingtemperature thresh-old.TheblacksolidtrianglesindicatethesensorsplacedbytheproposedCFD-guided approach,whenthegivensensornumberistwo.Theblacksolidcrossesindicatethe twosensorsplacedbythebaselineChipBestapproach.
similartotheresultinFig.2,i.e.,anormalrunwithbenchmark workload,itshowssignificantlyhighertemperaturethanthatin thenormal run.Thehighesttemperaturecanreachuptoabout 120◦C.Sixchipsarefoundtobeworkingunderoverheating condi-tionamongallthe11monitoredchips.Theplacementresultswith threesensorsisillustratedinFig.14forbothCFD-guided place-mentandChipBest.WeseethatCFDplacementplacessensorsin themiddleoftheclusterofoverheatingchipssuchthatmorechips canbecoveredbythelimitednumberofsensors.
Fig.15istheaveragedetectionprobabilityofthisCPU overload-ingscenario.WecanseethatCFDplacementconstantlyshowsthe bestdetectionprobabilityresult,andoutperformsbothChipBest
andChipAverage.Withasensornumberof2,theperformanceof
CFDreachestwiceashighasthatofCFDAverage.Theaverageerror rateofthecomponentoverheatingdetectionwithCPUoverloading isshowninFig.16.WeseethatCFDplacementoutperformsboth theChipBestandChipAveragewithalldifferentnumberofsensors.
6.5. Detectionperformanceincombinedoverheatingscenarios
We have evaluated our sensor placement scheme in three differentindividualoverheatingscenarios,includingpartialinlet overheating,fanfailureandoverheatingunderCPUoverloading. Asthetypeofoverheatingconditionisusuallyunknownbeforeit actuallyoccurs,weneedtopreparethesystemformonitoringany ofthepossibleoverheatingcondition.Inthissection,weevaluate theoverheatingdetectionperformanceofdifferentsensor place-mentschemesinacombinedoverheatingscenario.Thecombined
40 60 80 100 Av erage ction Probabilit y (% ) CFD Chip Best Chip Average Random Uniform Grid 0 20 11 10 9 8 7 6 5 4 3 2 1 Det e Sensor Number
Fig.16.AverageerrorrateinthescenarioofCPUoverloading3×power. 20 40 60 80 100 Av erage e ction Probabilit y (% ) CFD Chip Best Chip Average 0 20 11 10 9 8 7 6 5 4 3 2 1 Det e Sensor Number
Fig.17.Averagedetectionprobabilityinthecombinedoverheatingscenarios (sim-ulation).
overheatingscenarioconsistsofthepreviousthreedifferent indi-vidualoverheatingscenarios.Wepreparethesystembydeploying sensorstomonitoroverheatingcomponent inanyoftheabove threeoverheatingconditionsbasedontheformulationinSection 3.4.Specifically,weuseallthethreeCFDanalysisfromthe pre-viousthreedifferentoverheatingscenariosasinputandconduct oursensorplacementalgorithm,targetingtomaximizetheaverage overheatingdetectionprobabilityacrossallthethreeoverheating scenarios.Weconducttheevaluationfirstinsimulationandthen onourtestbed.
Figs.17and18arethesimulationresultsthatshowthe
aver-agedetectionprobabilityandaverageerrorrate,respectively,of thedetectionperformance forthis combinedscenarios. We see thatCFDhasalmostthesamedetectionperformanceastheChip Bestapproach.Comparedwithitsdetectionperformanceineach oftheindividualoverheatingscenario(asshowninprevious sec-tions),CFDperformsslightlyworseinthecombinedscenario.This ismainlybecausetheoptimizationalgorithmneedstoconsider alltheoverheatingscenariosatthesametimeandmakes trade-offsbetweendifferentscenarios.However,asdiscussedbefore,Chip
Fig.18.Averageerrorrateinthecombinedoverheatingscenarios(simulation).
Fig.19. Averagedetection probability inthecombinedoverheating scenarios (testbed).
Bestneedstotestalldifferentcombinationsofsensor/chippairing andselectthebestone,whichisactuallyinfeasibleinthereal imple-mentation.ComparedwithChipAverage,theCFD-guidedapproach stillperformssignificantlybetteronboththedetectionprobability andthedetectionerrorrate.
We then test different sensor placement strategies on our testbed.Theoverheatingdetectionprobabilityanddetectionerror rateofthehardware experimentare shownin Figs.19 and20, respectively.Asexplainedbefore,sinceweareunabletooverclock theCPUtocreatetheeventofCPUoverloading,asingleroundof eachexperimentconsistsoftwooverheatingscenarios,thepartial inletoverheatingandthefanfailureoverheating.Fromtheresults weseetheCFDplacementhasslightlybetterperformancethanChip Best.BothofthemconsistentlyperformbetterthantheCFDAverage
placement.Thehardwareresultslightlydiffersfromthesimulation resultbecauseofthedeviationintheCFDmodelingprocess.
7. Discussion
Inthissection,wefirstdiscussacloselyrelatedproblem, sen-sornumberminimizationproblem.Wethendiscussthepossible futureworkbasedontheoverheatingcomponentmonitoring sys-temusingoursensorplacementscheme.
7.1. Sensornumberminimization
Themaindesigngoalofthispaperistooptimizingthe deploy-ment locations ofgiven sensorsto reacha maximized average overheating detection probability of all the major components withintheserverbox.Whiletheprobabilitymaximizationis impor-tantforoverheatingdetection,sometimesitisalsointerestingto knowtheminimumnumberofsensorsrequiredtoreachatargeted overheatingdetectionprobability,especiallyinourserverbox com-ponentoverheatingdetectionapplication.Thisisbecausewithall theexistingcomponentsandwires,thespacewithintheserverbox
isusuallyverycompact,andthustheavailablespacethatcanbe usedtodeployadditionalsensorsisusuallylimited.
Theframeworkproposedinthisworkfordetectionprobability maximizationcanbeeasilymodifiedtoservethesensornumber minimizationpurpose.Toformulatethesensornumber minimi-zationproblem,wecanaddanadditionalconstraintoftargeted detectionprobability.Morespecifically,theformulationis:
arg min
(xi,yi)) ∀i
N (18)
subjecttothefollowingconstraints
PFj(SN)≤˛
∀
1≤j≤M (19)PDj(SN)≥ˇ
∀
1≤j≤M (20)whereSNisthelistoflocationsofalltheNsensors.Tosolvethis problem,wecanusethesamealgorithmproposedinSection5.3. Basically,weneedtofindoutthesmallestnumberofsensorsthat canprovidetherequireddetectionprobabilityfromconstraintEq.
(20)andalsomeetthefalsealarmrateconstraintinEq.(19).Since thisproblemisessentiallyavariantoftheproposeddetection prob-abilitymaximizationproblem,whichcanbesolvedwithasimilar algorithm,wedonotrepetitivelyshowexperimentresultsinthis paper.
7.2. Otherpotentialapplications
We have introduced that our proposed sensor placement schemecanbeusedtodeployingsensorstomonitoranddetect overheatingserver component under an unknownoverheating scenario using combined overheating scenario monitoring.We nowdiscusshowtointegrateserver-levelthermalmonitoringinto anotherpotentialapplication, overheatingrootcausediagnosis. Althoughitisimportanttocapturetheoverheatingcomponents,it isoftenmoredesirableifwecanfurtherdeterminethe overheat-ingreason.Inotherwords,itisoftenmoredesirabletodiagnose therootthatiscausingtheoverheatingphenomenon,suchthat actions,suchasincreasingfanspeedorloweringtheinlet temper-ature,canbetakentocorrecttheabnormaloverheatingbehavior oftheequipment.Toaccomplishthisgoal,inadditiontothe tem-peraturesensors,we canfurtherdeploy othertypesofsensors, suchasflowandacousticsensors,usingthesamesensor place-mentframeworkproposedinthiswork.Withtheadditionaltypes ofsensors,wecanfurthercharacterizetheworkingbehaviorofeach coolingrelatedcomponentandconditions,suchasserverfan,inlet flowspeedandflowpassageacrosstheserver.Bycharacterizing andmonitoringtheworkingconditionsofthesecomponents,we candeterminewhethertheyareworkingproperlytoprovidethe desiredcoolingcapabilities.Weplantointegratesensorplacement withoverheatingdiagnosisinourfuturework.
8. Conclusions
Efficientthermalmonitoringiscriticalfortoday’sserversystems toensuresafeoperationandcontinuousservice.Itisalsoimportant foreachservercomponenttomaintainadesirablelifetimeof ser-vice.However,thecurrentpracticeofserverthermalmonitoring simplyrelies oneithersensorsplaced attheserverinletor on-diethermalsensorsequippedonlywithsomeofcomponents,such asCPU,memoryorboth,whichmayleadtodegraded overheat-ingdetectionperformanceforcertaincomponents.Inthispaper, wehave presenteda novelsolutiontoplace additionalsensors intoserverboxforoverheatingservercomponentdetectionbased ontheCFDanalysisofthethermalandfluiddynamicsinsidethe serverbox.OursensorplacementschemeappliesConstrained Sim-ulatedAnnealingalgorithmwithareducedsearchspacetofinda
sensorplacementwithmaximizedoverheatingcomponent detec-tionprobability.Oursolutionalsoadoptsdatafusiontechniquesto collaborativelymaketheoverheatingdetectiondecision,resulting inimproveddetectionperformance.WeevaluateourCFD-based sensorplacementstrategywithareal-world2Urackserverin dif-ferent component overheatingscenarios. Our resultsshowthat the proposed placement strategy achieves significantly better overheating detection performance than several well-designed baselines.Extensivesimulationresultsalsodemonstratethe effec-tivenessofourCFDguidedsensorplacementscheme.
Acknowledgements
Thisworkwassupported,inpart,bytheUSNationalScience Foundation under grants CCF-1143605, CNS-1218154, CNS-1143607(CAREERAward),andCNS-0954039(CAREERAward),and bytheUSOfficeofNavalResearchundergrantN00014-11-1-0898 (YoungInvestigatorProgram).
References
[1]J.Srinivasan,S.Adve,P.Bose,J.Rivers,Lifetimereliability:towardan architec-turalsolution,IEEEMicro25(3)(2005)70–80.
[2]J.Srinivasan,S.Adve,P.Bose,J.Rivers,Thecaseforlifetimereliability-aware microprocessors,in:in:ISCA,2004.
[3]F.J.Mesa-Martinez,E.K.Ardestani,J.Renau,Characterizingprocessorthermal behavior,in:in:ASPLOS,2010.
[4]N.Tolia,Z.Wang,P.Ranganathan,C.Bash,M.Marwah,X.Zhu,Unifiedthermal andpowermanagementinserverenclosures,in:in:ASME,2009.
[5]J.Donald,M.Martonosi,Techniquesformulticorethermalmanagement: clas-sificationandnewexploration,in:in:ISCA,2006.
[6]R.Z.Ayoub,K.R.Indukuri,T.S.Rosing,Energyefficientproactivethermal man-agementinmemorysubsystem,in:in:ISLPED,2010.
[7]S.Gurumurthi,A.Sivasubramaniam,Thermalissuesindiskdrivedesign: chal-lengesandpossiblesolutions,TransactionsonStorage2(2006).
[8]J.Lin,H.Zheng,Z.Zhu,E.Gorbatov,H.David,Z.Zhang,Softwarethermal man-agementofdrammemoryformulticoresystems,in:in:SIGMETRICS,2008. [9]X.Wang,X.Wang,G.Xing,J.Chen,C.-X.Lin,Y.Chen,Towardsoptimalsensor
placementforhotserverdetectionindatacenters,in:in:ICDCS,2011. [10]Z.Yuan,R.Tan,G.Xing,C.Lu,Y.Chen,J.Wang,Fastsensorplacementalgorithms
forfusion-basedtargetdetection,in:in:RTSS,2008.
[11]K.Skadron,M.Stan,W.Huang,S.Velusamy,K.Sankaranarayanan,D.Tarjan, Temperature-awaremicroarchitecture,in:in:ISCA,2003.
[12]Y.Kim,S.Gurumurthi,A.Sivasubramaniam,Understandingthe performance-temperatureinteractionsindiski/oofserverworkloads,in:in:HPCA,2006. [13]J.Choi,Y.Kim,A.Sivasubramaniam,J.Srebric,Q.Wang,J.Lee,Modelingand
managingthermalprofilesofrack-mountedserverswiththermostat,in:in: HPCA,2007.
[14]R.Ayoub,R.Nath,T.Rosing,Jetcjointenergythermalandcoolingmanagement formemoryandCPUsubsystemsinservers,in:in:HPCA,2012.
[15]N.El-Sayed,I.A.Stefanovici,G.Amvrosiadis,A.A.Hwang,B.Schroeder, Tem-peraturemanagementindatacenters:whysome(might)likeithot,in:in: SIGMETRICS,2012.
[16]J.Moore,J.S.Chase,Weatherman:automated,online,andpredictivethermal mappingandmanagementfordatacenters,in:in:ICAC,2006.
[17]J. Moore,J. Chase,P.Ranganathan, R.Sharma, Makingscheduling“cool”: temperature-awareworkloadplacementindatacenters,in:in:USENIX,2005. [18]M.Jonas,R.R.Gilbert,J.Ferguson,G.Varsamopoulos,S.K.S.Gupta,Atransient
modelfordatacenterthermalprediction,in:in:IGCC,2012.
[19]L.Li,C.-J.M.Liang,J.Liu,S.Nath,A.Terzis,C.Faloutsos,Thermocast:a cyber-physicalforecastingmodelfordatacenters,in:in:SIGKDD,2011.
[20]J.Chen,R.Tan,Y.Wang,G.Xing,X.Wang,X.Wang,B.Punch,D.Colbry,A high-fidelitytemperaturedistributionforecastingsystemfordatacenters,in:in: RTSS,2012.
[21]C.-J.M.Liang,J.Liu,L.Luo,A.Terzis,F.Zhao,RACNet:ahigh-fidelitydatacenter sensingnetwork,in:in:SenSys,2009.
[22]S. Memik, R. Mukherjee, M. Ni, J. Long, Optimizing thermal sen-sor allocation formicroprocessors, IEEETransactionson Computer-Aided Design of Integrated Circuits and Systems 27 (3) (2008) 516–527, http://dx.doi.org/10.1109/TCAD.2008.915538.
[23]T.Yasuda,On-chiptemperaturesensorwithhightoleranceforprocessand temperaturevariation,in:in:ISCAS,2005.
[24]Y.Zhang,A.Srivastava,M.Zahran,Chiplevelthermalprofileestimationusing on-chiptemperaturesensors,in:in:ICCD,2008.
[25]A.Krause,C.Guestrin,A.Gupta,J.Kleinberg,Near-optimalsensorplacements: maximizinginformationwhileminimizingcommunicationcost,in:in:IPSN, 2006.
[26]P.K.Varshney,DistributedDetectionandDataFusion,Springer-Verlag,Inc,New York,1996.
[27]S.Marsh,Directextractiontechniquetoderivethejunctiontemperatureof hbt’sunderhighself-heatingbiasconditions,IEEETransactionsonElectron Devices47(2000).
[28]CFDflowmodelingsoftwareandsolutionsfromfluent,http://www.fluent.com [29]S.V.Patankar,NumericalHeatTransferandFluidFlow,HemispherePublishing
Corporation,NewYork,1980.
[30]B.W.Wah,Y.Chen,T.Wang,Simulatedannealingwithasymptotic conver-gencefornonlinearconstrainedoptimization,JournalofGlobalOptimization 39(2007).
[31]A.Verma,G.Dasgupta,T.K.Nayak,P.De,R.Kothari,Serverworkloadanalysis forpowerminimizationusingconsolidation,in:in:USENIX,2009.
[32]MEMSIC, TelosB mote, http://www.memsic.com/products/wireless-sensor-networks/wireless-modules.html
[33]K.Choi,W.Lee,R.Soma,M.Pedram,Dynamicvoltageandfrequencyscaling underapreciseenergymodelconsideringvariableandfixedcomponentsof thesystempowerdissipation,in:in:ICCAD,2004.
XiaodongWangiscurrentlyaPh.D.Studentinthe Depart-mentofElectricalandComputerEngineeringattheThe OhiostateUniversity.BeforejoiningTheOhioState Uni-versity,hewasaPh.D.studentatUniversityofTennessee, Knoxville.HeistherecipientofthefirstMinKao Fel-lowshipofElectricalEngineeringandComputerScience DepartmentatUniversityofTennessee,Knoxvillefrom 2007to2010.HealsoreceivedtheESPNGraduateStudent FellowshipandtheChancellorsAwardforExtraordinary ProfessionalPromiseAwardfromUniversityofTennessee, Knoxville,in2010and2011,respectively.Hereceivedhis M.S.inComputerEngineeringfromUniversityof Ten-nessee,Knoxvillein2009andB.S.degreeinElectrical EngineeringfromShanghaiJiaoTongUniversity,China,in2006.In2007,heworked atPDFSolutionsInc.asaDataAnalysisEngineer.
XiaoruiWangreceivedthePh.D.degreefromWashington UniversityinSt.Louisin2006.Heisanassociateprofessor intheDepartmentofElectricalandComputer Engineer-ingatTheOhioStateUniversity.Heistherecipientof theUSOfficeofNavalResearch(ONR)YoungInvestigator (YIP)Awardin2011,theUSNationalScienceFoundation (NSF)CAREERAwardin2009,thePower-Aware Comput-ingAwardfromMicrosoftResearchin2008,andtheIBM Real-TimeInnovationAwardin2007.Healsoreceivedthe BestPaperAwardfromthe29thIEEEReal-TimeSystems Symposium(RTSS)in2008.Heisanauthororcoauthorof morethan60refereedpublications.From2006to2011,he wasanassistantprofessorattheUniversityofTennessee,
Knoxville,wherehereceivedtheEECSEarlyCareerDevelopmentAward,the Chan-cellorsAwardforProfessionalPromise,andtheCollegeofEngineeringResearch FellowAwardin2008,2009,and2010,respectively.In2005,heworkedattheIBM AustinResearchLaboratory,designingpowercontrolalgorithmsforhigh-density computerservers.From1998to2001,hewasaseniorsoftwareengineerand thenaprojectmanageratHuaweiTechnologiesCo.Ltd.,China,developing dis-tributedmanagementsystemsforopticalnetworks.Hisresearchinterestsinclude power-awarecomputersystemsandarchitecture,real-timeembeddedsystems, andcyber-physicalsystems.HeisamemberoftheIEEEandtheIEEEComputer Society.
GuoliangXingreceivedtheB.S.degreeinelectrical engi-neeringandtheM.S.degreeincomputersciencefrom XianJiaoTongUniversity,China,in1998and2001, respec-tively,andtheM.S.andD.Sc.degreesincomputerscience andengineeringfromWashingtonUniversityinSt.Louis, in2003and2006,respectively.Heisanassistantprofessor intheDepartmentofComputerScienceand Engineer-ingatMichiganStateUniversity.From2006to2008,he wasanassistantprofessorofcomputerscienceatCity UniversityofHongKong.HeisanNSFCAREERAward recipientin2010.HereceivedtheBestPaperAwardat the18thIEEEInternationalConferenceonNetwork Proto-cols(ICNP)in2010.Hisresearchinterestsincludewireless sensornetworks,mobilesystems,andcyber-physicalsystems.
Cheng-XianLiniscurrentlyanAssociateProfessorin theDepartment ofMechanical andMaterial Engineer-ingatFIU.HispriorpositionsincludeAssociateProfessor in the University of Tennessee, Knoxville and Sum-merFacultyFellowatAirForceResearchLaboratoryin WPAFB.HeearnedhisPh.D.inMechanicalEngineering (ThermalEngineering)fromChongqingUniversity,China. Hehasauthoredandco-authoredover150 papersin peer-reviewedjournalsandconferenceproceedings.His currentresearchinterestsincludeComputationalFluid Dynamics,HeatTransfer,ThermalManagement,Energy EfficiencyandRenewableEnergyinBuiltEnvironments. HeisamemberoftheASMEandASHRAE.