PROGRESSIVERETRIEVAL ANDHIERARCHICAL VISUALIZATION OFLARGE
REMOTEDATA
HANS-CHRISTIAN HEGE
∗
,ANDREI HUTANU
∗
,RALF KÄHLER
∗
, ANDRÉ MERZKY
∗
, THOMAS RADKE
†
,
EDWARD SEIDEL
†
, AND BRYGG ULLMER
†
Abstrat.
Thesizeofdatasetsproduedonremotesuperomputerfailitiesfrequentlyexeedstheproessingapabilitiesofloalvisualization
workstations. This phenomenoninreasingly limitssientists whenanalyzing results of large-sale sienti simulations. That
problemgetsevenmoreprominentinsientiollaborations,spanninglargevirtualorganizations,workingonommonsharedsets
ofdatadistributedinGridenvironments.Inthevisualizationommunity,thisproblemisaddressedbydistributingthevisualization
pipeline.Inpartiular,earlystagesofthepipelineareexeutedonresoureslosertotheinitial(remote)loationsofthedatasets.
Thispaper presentsan eienttehniquefor plaingthe rsttwostages ofthe visualizationpipeline(data aessand data
lter)ontoremoteresoures. ThisisrealizedbyexploitingtheextendedretrievefeatureofGridFTPforexible,highperformane
aesstovery largeHDF5les. Wereduethenumberofnetworktransationsforlteringoperationsbyutilizingaserverside
dataproessingplugin,andhenereduelatenyoverheadomparedtoGridFTPpartialleaess. Thepaperfurtherdesribes
theappliationofhierarhialrenderingtehniquesonremoteuniformdatasets,whihmakeuseoftheremotedatalteringstage.
1. Introdution. The amount of data produed by numerial simulationson superomputing failities
ontinuestoinreaserapidly inparallel withtheinreasingomputepower,mainmemory,storagespae,and
I/Otransferratesavailabletoresearhers. Thesedevelopmentsinsuperomputinghavebeenobservedtoexeed
thegrowthofommoditynetwork bandwithandvisualizationworkstationmemory/performanebyafatorof
4[11℄. Hene, itis inreasingly ritialto use remotedata aess tehniques foranalyzing this data. Among
otherfators,thistendenyisstrengthenedbytheinreasingpromineneoflarge,spatiallydistributedsienti
ollaborationsworkingonommon,sharedsetsofdata. Undertheseonditions,thesimpleapproahof(partial)
datarepliationforloaldataanalysisdoesnotsale.
Thesheersizeofexistingdatasetsreatesademandforexibleandadaptivevisualizationtehniques,suh
ashierarhialrenderingorviewpointdependent resolution. Suh tehniques anredue theinitial amountof
data to bevisualized bymaintaining the overall visual impression of the full data set. This anbeahieved
(e.g.) byretrievingtheportionsofthedatasetwhihareimportanttotheuser;orbyretrievinglowresolution
versions ofthefull dataset rst,and reningthis datalater. Remoteaess topartial interesting portions of
largedatalesansigniantlysupportthesetehniques.
Onemajorproblemofnaiveremotedataaesstehniquesistheinherentdiultyinhandlingmeta data
forlarge data sets. Meta data is thehighly strutured set of informationdesribing the dataset, ontaining
(e.g.) the numberof samples peroordinate axisand thedata volume bounds within physial spae. While
the metadataitself is relatively small, meta data aess is oftenonneted with many small read operations
andmanyseekoperations. However,individuallyrequestingmanyseeksoveraremote,potentiallyhigh-lateny
onnetionisquiteineientforprotoolsthatdonotsupporttransationsoverhigherleveloperations[13,19℄.
Ingeneral,thesedevelopmentsultimatelyrequiredistributingthepipelineusedfordatavisualization. The
presentpaperdesribestehniquesuseablefordistributingearlystagesofthisvisualizationpipeline. Speially,
we enable the appliation to eiently aess portions of remote large data sets present in the HDF5 le
format[2℄. Thisgeneral approah anbe adapted both to other le formatsand other aess patterns. The
paperfurtherpresentshigherlevelvisualizationtehniqueswhihutilizethesedataaessmehanismstoprovide
adaptiveandprogressiverenderingapabilities.
Thepaperisstruturedasfollows. First,wedesribetheproblemspaeourapproahistargetinginmore
detailinset.2. Next,werelateourresearhtootherrelevantresearhativities(seeset.3). Inset.4follows
an overall desription of the tehniqueswe developed. Set. 5and 6 desribe the main omponents in more
tehnialdetail. Thepaperonludes withtwosetionsaboutourresultsandanoutlookforfuture work.
2. Senario. Theinreasinggapbetweenresouresavailableatremotesuperomputingentersandonthe
loalworkstationsofindividualresearhersisoneofthemajormotivationsforourresearh. Inpartiular,weaim
toimprovetheaess toGrandChallengesimulationresultsasprodued bynumerous researhollaborations
∗
ZuseInstituteBerlin(ZIB),http://www.zib.de/,{hege, hutanu, kaehler, merzky, ullmer}zib.de
†
aroundtheworld[12,25,26℄. Thesesimulationstendtodrivetheresoureutilizationofsuperomputerresoures
totheavailablemaximum,andoftenprodueimmenseamountsofdataduringsinglesimulationsruns.
Asanexemplaryappliationweonsider numerialrelativitysimulationsperformedintheCatus
simula-tion framework [10℄. Amongother things, this framework provides the simulationode with aneientI/O
infrastruture to write data to HDF5 les. The astrophysial simulationsin questionwrite data for salar,
vetorand tensor elds (as omponents stored in separate data sets or les), and parameters for simulation
runs. Atypialsize foradataleisontheorder oftensofgigabytes 1
.
Visualization ofthis data duringpost simulationanalysis usually doesnotrequireaess tothe omplete
data set. Fortypialprodution runs,where many dierent physial elds arewritten to disk, onlyaouple
ofthese eldsarevisualizedlater. Thedatasetsandsubsetsthat aretobevisualizedarenotinitiallyknown,
but depend oninterativeseletionsbytheuser(timestep,eld, resolution, spatial area,et.). Forourtarget
users,thisexibilityneedstobemaintainedasfaraspossible.
Withintheseonstraints,ourtargetsenarioisthefollowing:
A sientist performs a large sale simulation run, utilizing one or more superomputing
re-soures at dierent loations. The simulation run produes up to TBytes of data, by storing
various salar and vetor elds to HDF5 les. These HDF5 les are reated aording to a
ustompredenedstruture.
Afterthesimulationnishes,membersofthesientists'ollaborationwishtovisualizethedata,
or portions hereof, from remote workstations. They would like to use standard visualization
tehniques from their visualization environment. They also wish to interatively hoose the
data eldsto bevisualized, and tointeratively hange the spatial seletion andresolutionfor
the data.
Ideally, the data transfer and visualizationare adaptive to the available network onnetivity,
andhidesdatadistributiondetails from theuser.
This senario denes the problem spae we are targeting. We expliitely do not expet to nd data on the
remotesystemswhihare,bypre-orpostproessing,speiallypreparedforlatervisualization. Wealsowant
toprovideasolutionforenvironmentswithnotoriousshortsupplyofI/Obandwithandomputeresoures. And
wewanttoenableremotevisualizationforabroadwidth ofendusers,onnetedtotheGridby awiderange
of network types and with varying,potentiallylow end ommodity systems. The ability of the visualization
pipelineto beadaptive tothat rangeofboundaryonditionsisaentral pointofoureortsthefousofthe
paperonprogressivedataretrievalpatternsandonhierarhialrenderingtehniquesemphasizesthis.
3. Related Work. To support the senario we presented, it is ultimately neessary to distribute the
pipelineusedfordatavisualization. Inpriniple,therearemanypossiblewaystodistributethispipeline(g.3.1)
overremoteresoures. Thedistributionshemesusedin realworldsystemsarelimitedbytheommuniation
requirementsfor transferringdata betweenthe stages of thepipeline, and bythe omplexity of theresulting
distributed softwaresystems.
Data
Source
Data Set
Image
filter
map
render
display
access
view
Geometry
user control
Fig.3.1.Mostvisualizationsystemssharethesameunderlyingvisualizationpipeline[27 ℄.
Theomponents of the pipeline an be freely distributed, in priniple,as the ommuniation
elementsbetween theseomponents havedierent demands onlateny and bandwith required.
Allelementsofthepipelineshouldbeontrolled bytheenduserorbytheappliation.
Earlystagesof thepipelineremoteaessand remotelteringpotentiallyneedto transferand proess
largeamountsofdata,butshowonsiderableexibilitywithrespettolateny. Also,bydistributingtheseearly
1
stages,itispossibletoompletelyhidethedataloalityfromappliationandenduser. Remoteaesssolutions
asNFS[4℄andAFS [3℄ allowtransparentutilizationofstandard(loal)leI/Otehniques. However,systems
like NFS and AFS are problemati in the administrativemaintenane. For widely distributed environments
spanningmultipleadministrativedomainsthese solutionsarenotappliable.
Commonremote data aesstehniquesrossing administrativeboundariesaremarkedby several
limita-tions. Some,likeSCPandFTP,donotsupportaesstopartialles,whihisnotaeptableforourpurposeof
adaptivevisualizing. Othertehniquesfailtodelivertheperformanerequiredforinterativedatavisualization.
Forexample,theGridFTPsupportforaesstoremoteleswiththepartialleaessfeature[9℄isineient
formetadataaess. DuetotheleformathosenbyHDF5,metadataisnotneessarilystoredinaontinuous
lespae, but insteadsatteredin ahierarhialbinarytree. Also,asingleread ontheHDF5 APIlevelmay
betranslatedbythelibraryintomanyindividuallow-levelseek/readoperationsonthevirtualledriverlevel.
Otherprotoolsaresimilarly lakingin supportfortransationsofhigherleveloperations[13,19℄.
Remotelteringtehniquesoftenintegratemodelsofmeta dataanddatastrutures,and anperformthe
dataaesseiently 2
. Also,puttingtheremotelterontheremotesiteansigniantlyreduetheamountof
datatobetransferredoverthenet,andensuresthatonlythedataatuallyneededforthevisualizationproess
isretrievedand transferred. A standardproblemfor remoteltering isthat this proess needsto integratea
model of the datastrutures it is operating upon. It is diultorimpossibleto implementltering without
expliitinformation about what is to be ltered, andthis information is diult to express in ageneralway
thatisappliableoverabroadrangeofdataformatsandmodels. Hene,remotelteringtehniques areoften
limitedto spei leanddatatypes,and tospei lteringoperations.
The Data Cutter projet [14℄ is another well known representative of the remote ltering approah. It
provides the appliation programmer with a exible and extensible lter pipeline to aess portions of the
originaldataset. Comparedtoourapproah,thereareseveralmaindierenes. First,thedatautterrequires
thedata to be stored in hunked data lesin order to benet from its boundarybox indexing sheme, sine
allhunks with abounding boxat least partlyoverlappingwiththe areaof interestare ompletelyread into
memory, and passed to thelter pipeline. Also, sine alllters passdata using network ommuniation, the
total network loadis muh higher than for our approah, where the lterresides at the data soure, and is
tightly oupledto the data aess stage. Further, ourutilization of standardGrid tools(GridFTP andGSI)
seemsmoreappropriate for thetargeted Grid environment. On theother hand, Data Cutters userdenable
lterpipelineismoreexiblethanourapproah.
Onewidelyused ompromiseforremoteltering istheusage ofpreproessed datasets: during the
simu-lationsI/Ostage orduring apostproessing step,lteroperationsareappliedto reatenewdata setsonthe
remoteresoures. These datasets are stored in optimizedform makinglater remoteaess and visualization
very eient. In the future, moreand more simulation frameworks will support suh features, not at least
in order to improve their own I/O harateristis, i.e. due to ompression on the y, but also to enable the
eient handlingof theverylargedata sets, after ompletion of thesoure simulation. Wavelet transformed
data storage is an exellent example of that tehnique [22℄, whih allows lossless ompression, and adaptive,
eient oine aes to optimally resolveddata samples. Other examplelters reate otrees [18℄ or similar
struturedrepresentations[21℄,orprovideprogressivemeshgeneration.
For theproblem spae wedesribed with oursenario, pre applied lters are novalid option,sine they
either need to beintegrated into the simulation I/O ode, what they aren't in our ase; orthey need to be
exeutedviaexternaljobsontheremoteresoure. Thisdupliatesthestorageneededandpotentiallyperforms
exesswork,therebywasting ostlysuperomputingresoures.
Afterltering,visualizationalgorithmsworkonthedataandmapessentialfeaturesintogeometries
(inlud-ingolorandtextureinformation, et.). Thenextstage rendersimagesfrom thesegeometrialrepresentation.
Inthe future, these stages may also be exeuted loseto thedata soure, on the superomputer itself. This
would be the most eient way to handle large simulation data, sine the amount of data to be transfered
duringthelaterstagesofthevisualizationpipelinetypiallydereasessigniantly. Completelyhangedaess
patternstoremotedataansigniantlyreduetheamountofdatatransfered. Visualizationalgorithmsusing
suh patterns[23℄,inpartiularforlargedata,areseenasuseasesforthepresentedwork.
ThebestprospetsofdeployingsuhsenarioshavethoseenvironmentsontainingPC-lusterbased
super-omputers. Here,addingommoditygraphisboardstoallnodesdoesnotinreasethetotalostssigniantly,
but allowshigh performaneimage rendering. These typesoflustersarebeominginreasinglyommon,but
arestill rarein thetop500 [6℄. Fortheollaborativeandhighly interativevisualizationsenarioweenvision,
the feedbak to theremote and distributed rendering systemgets important,and omplex. Also, in perhaps
the mostimportant point, the eld is urrentlymissing suientlyexible software solutions whih are able
torealizesuhsenarios. Promisingapproahesdoexistthroughwork suh as[8,7,24℄,and weexpetmajor
progressin thateld overthenextdeade.
4. Arhiteture. Our proposed remote data aess sheme builds upon the GridFTP protool [9℄.
GridFTPisaGrid-awareextensiontothestandardFTPprotool. Amongstothers,itprovidesaexibleserver
side proessing feature, and allowsspeiation of ustom operations on remote data. These operations are
performedbyorrespondingustomextensions(plugins) totheGridFTPserver. Thistehniqueisdesribed
inmoredetailinset.5. Weutilizetheseserversidedataproessingapabilitiestoperformdataltering
oper-ationsonthesientidatasets. Asdesribed, thedatasets arestoredremotelyin HDF5format. Ourplugin
totheGridFTPserveraessesthisdataloallyviatheHDF5library,andperformsdatalteringonthey.
HDF5
File
Visualization System
(Amira)
GridFTP Client
Network
HDF5 Plugin
− MetaData
− Data Blocks
MetaData
Cache
Native
HDF5 Calls
GridFTP Protocol (ERET/ESTO)
GridFTP Server
Fig.4.1.TheGridFTPprotooltransportsERETommandsfromthevisualizationsystem
to theGridFTP server,whih forwards them totheHDF5 plugin. This way, theplugin an
performI/OoperationspluslteringanddatatypeonversionontheHDF5le withfullloal
performane. Datais transferred bak via ESTOommands, and iswrittenintothe memory
buerofthevisualizationproess.
An important element forthe arhitetural deisionsis the usageof theHDF5 le format[1℄. Given the
omplexityofthisformatandtheongoingimprovementeortsonerningtheassoiatedAPI,thedeisionwas
tousetheexisting APIandtohavetheremoteaessproedureseitherontopoftheAPIorasalsodesribed
in set. 3 underneath of it. The arhiteturedesribedin this work has theremote operations on topof the
HDF5API,alimitedset ofhigh-leveloperationswashosento beimplementedbymakinguseof theexisting
API,andtheseoperationswereintegrated intheGridFTPservertobeexeutedattheremotesite.
A omplete visualization session is performed asfollows. Theuser selets adata le to be visualizedby
browsing the remote le spae. Next, a onnetion to the remote GridFTP serveris established, using the
usersGSIredential. Theserverpluginisutilizedto performanextrationofthelesmetadata(seeset.5),
whihisthentransferedtothevisualizationhostandahedontheloallesystem. Thevisualizationsystem
aessesthisloalHDF5le,extratsallneededinformation(numberoftimesteps,bounding box,resolution,
...), and reatesan otree hierarhy ttingthe dataset. Theuseraninterativelyspeify thedepth of the
hierarhy. Astheuserthentriggersvariousvisualizationoperationsonthedata(to produeorthoslies,hight
elds, volumetrirenderings),theotreebloksaresheduledin aseparatethreadfor datareading. Theread
requestsare servedaordingtoaprioritytagdened forthevisualization, andeahtriggeraGridFTPdata
aess. This GridFTPdataaessutilizesourremoteGridFTPserversidedataproessingplugin. It extrats
thedatain theblokspeiresolutionandreturnsthisdata. Onarrival,thedataisstoredwithin theotree
hierarhy, and the visualization is triggered to update the rendering by inluding the newly arrived data.
On userrequest (e.g., nexttimestep) ortimeout, allpending blokreads anbe aneled. Our visualization
tehniques(see set. 6) use these features for dynami data aess to optimize visualization performane by
requestingdatablokslosetotheviewpointrst,andbyprogressivelyimprovingdata(andimage)resolution.
transfer. This approah givesus a numberof advantages if ompared to approahes implemented on top of
ustomorproprietaryprotools.
1. GridFTPallowsforserversidedataproessing,whihweutilizefordataltering.
2. TheGridFTPprotool,asanextensiontothestandardFTPprotool,iswellknownandreliable.
3. Itallowstheinorporationofstandardserversforsolutionswithlimitedfuntionality. 3
4. TheGridFTPinfrastruturetakesareof:
•
establishingthedataonnetion;•
ensuringauthentiationandauthorization;•
invokingthedatalterplugin;and•
performingthedatatransfer;Inthis way, thedata transfertask is redued to lling abuer onthe writingand reading it on the
reeivingend.
Thefollowingsubsetionsdesribetheserversideproessinginmoredetail,andspeifythelowleveloperations
weuse.
5.1. Server-Side Proessing. As desribed before, the GridFTP protool enables support for adding
ustomommands for serverside data proessing [9℄. Speially, the pluginsoered bya serverdene sets
of ERETand ESTO parametersthat orrespond to thedata ltermodule implemented bythe plugin 4
. The
extended store (ESTO) and extended retrieve (ERET) ommands of the GridFTP protool are dened as
following:
ESTO <module_name>="<modules_parms>" <filename>
ERET <module_name>="<modules_parms>" <filename>
module_nameis a server-spei stringrepresenting the name of the module to be used. The seond string
(module_parms)ismodulespeianddenestheoperationtobeperformedbythemodule. Thelastparameter
(filename)speiesthele tobeproessed,whihanbeanylethatanbeproessedbythegivenmodule.
Inourase,anyHDF5le.
5.2. Operations. Weusethis ERET/ESTO mehanismtodene twooperationsthatanbeappliedto
HDF5les: oneformetadataltering,andaseond onefordataaess.
MetaDataFiltering. TherstoperationisthelteringofmetadatafromtheHDF5le. Thisisahieved
byreatingalteredopyof theoriginal le. Towardthis end, themodule reads andparses theoriginalle,
andwritesthemetadatainformationtoaopyofthele. However,whenopying(writing)adataset,weuse
theHDF5lterinterfaeandapplyaltertotheoriginallesdataset. Thislterreduesalldatasetstozero
length 5
. Thus, the only resulting dierenes betweenthe generated le and the original one are in the data
arrayand storagelayoutofthedata sets. All otherinformatione.g.,the hierarhy(groups), attributes,and
dataset information (name,data typeanddata spae)ispreserved. Whilethis approah mightseemlikea
signiantoverhead,itisinfatveryfast,dueto thegoodperformaneofHDF5.
ThegeneratedleistransferredtotherequestinglientusingGridFTP.TheERETommandforrequesting
themeta dataleis:
ERET Hdf5="METADATA" <filename>
filenameis thele from whih themeta data will be extrated. Giventhenowdramatiallyredued size of
thele, thetransfertimeis verysmall relativeto thetransfertime oftheoriginal data 6
. Afterthehigh-level
ltering all is exeuted remotely and the transfer is nished, the lient an aess the loal meta data le
using the standardHDF5 API. In this way, weavoid to exeute eah HDF5 APIall remote, and still oer
the userthe exibility of the original APIfor meta data aess. Beause the data set strutures within this
temporary loal le do not ontain atual data, the standard API annot be used for data aess. Forthis
task,weprovideaseondAPIall.
3
bakwardsompatiblewithFTP,byusingnormalFTPweouldtransfertheletoaloaldiskahe;forstandardGridFTP
server(withoutplugins)weusediretpartialleaess(ERET PART,forlteringineient).
4
Notallserversimplementthesamesetofmodules.Intheurrentimplementation,thepluginsareompiledtogetherwiththe
server,andarestatiallylinked.
5
Atually,fortehnialreasonsinternaltoHDF5thelengthis1.
Data Set Readingand Subsampling. Theseond operationperformsdataseletionandltering. By
knowingthedataset oordinates (dimensions,data type)from thenowloally available metadata,the lient
anhoose to read an entire data set,or aportion of the data set. TheHDF5 data sets logially groupthe
atualdatawithin multidimensionalarraysnamed dataspaes. Themodelweuse tospeifyaportionfrom
a data set is based on the HDF5hyperslab model. A hyperslabdesribes either a ontiguousolletion of
points,oraregularpatternofpointsorbloksinthedataspaes. Ahyperslabisspeiedbyfourparameters:
•
origin: thestartingloation;•
size: thenumberofelements(or bloks)toseletalongeahdimension;•
stride: thenumberofelementstoseparateeah element(orblok)tobeseleted;and•
blok: thesizeofeahblokseletedfromthedataset.All ofthese parametersareone-dimensionallists, withlengthsequalto thenumberofdimensionsof thedata
set. Theelementsofthese listsspeify dataarraylengthsorosetsfororrespondingdimensionsofthe data
arrays.Currentlythesizeofelementbloksispredenedtoone,whihisadequateforthetargetedvisualization
senario. Infuturework,wewillextendtheprotooltoaeptvariablebloksizes.
Oururrentmehanismforspeifyingthehyperslaboordinatestakesthefollowingform:
ERET Hdf5="BLOCK:NAME=<datasetname>;\
DIMENSIONS=<dims>;\
ORIGIN=<orig0>,<orig1>,...,<orign>;\
SIZE=<size0>,<size1>,...,<sizen>;\
SAMPLING=<sampling0>,<sampling1>,...,<samplingn>"
<filename>
datasetnameisthe fullyqualied name (inludingthepathto the dataset) of thedata set from whih data
shouldberead;orig0toorign aretheoordinatesoftherstelementtobeseletedfromthedataset;size0
tosizen arethenumberofelementstobeseletedineahdimension;andsampling0tosamplingnrepresent
thedistanebetweentwoseletedelementsforeahdimension.
This requestissentto theserver. The serveropensthele lename, opensthegivendataset,and reads
theportionof thelespeiedbythegivenparameters. ThisproedureisperformedvianativeHDF5library
alls. Next,the retrieveddata is sent viathe GridFTPonnetion to the lient, whih will onvert the data
to the loal byte order ifneeded. Todetermineif onversionis neessary, therst 32bits sentby the server
representanintegerwiththevalueof 1,enodedusingtheserversbyteordering.
TheapproahwehavetakeninreatingthislimitedHDF5APIwrapperdoesreduetheexibilityprovided
bytheoriginal API.Nonetheless,for ourvisualizationsenariothisAPIis appropriate,andmakessigniant
stepstowardmaximizingoverallperformane. ToretaintheexibilityoftheoriginalAPI,oneapproahwould
betoexeuteeahnativeAPIallremotely. Inthisase,theostperallisatleastthatofthenetworklateny.
This,ombinedwith therelativelylargenumberofalls neededfor exampleto gatherthemeta datafromthe
le, signiantly reduestheperformane. This motivatestheusage ofhigherlevelAPIwrappers,astheone
wehaveimplemented. However,suhwrappersneednottobeaslimitedasoururrentversionofourse.
5.3. Seurity. The seurity model used used by the GridFTP serveris GSI (Grid Seurity
Infrastru-ture) [17℄. The lient needs to hold avalid GSI proxyontaininga seurity redentialwith limitedvalidity.
The proxy represents a Distinguished Name (DN) that must be present in the grid-mapfileof the server
mahine inorderfortheservertoaepttheonnetion. Thisproxyisusedto authentiatethelientwithout
using passwords. After the onnetion is established, the server front end starts the MPI-based bak end.
This bak end runs under theloal identityto whih theDN is mapped. Thebak end is responsiblefor all
subsequentoperations,inluding thelteringoperations. Thisensuresthat onlyauthorizedlientsanaess
theinformationfromtheoriginalle.
6. Adaptive Visualization. Weutilizethepreviouslydesribedtehniquesfordata aessandltering
togeneratealevel-of-detailrepresentationoftheremotedataset inthevisualizationphase.
First,themetadatai.e. informationaboutthenumberofdatasamplesperoordinateaxisandthedata
volumeextensioninphysialspaeisretrieved(seeset.5.2). Withhelp ofthisinformation,andaseletable
minimalresolutionofthedata,anotreestrutureisgenerated,whihinitiallyontainsnodataotherthanthe
parent-hildrelationsand position andextensions ofthetreenodes. Therootnodeofthestruture willstore
Next,the data forthe otreenodes isrequested from the readermodule, startingat theroot node. The
order in whih nodes are rened is determined by the distane from a user-dened point-of-interest, whih
mightbetheameraposition oranarbitrarypointwithinthedatavolume. Subregionsof thedatasetsloser
tothispointarerequestedwithhigherprioritythanthosewhiharefurtheraway. Thepositionandresolution
parametersforeahrequestarespeiedandsenttotheremotemahineasdesribedinset.5.2.
Thereaderrunsinaseparatethread,sothevisualizationroutinesarenotblokedduringtheloadingphase.
Eah timeadatablokhas arrived,thevisualization module isnotied, and thisnew datais reetedin the
nextrenderedframeofthevisualization.
Fig.6.1. Thesequenedepitsthevolumerenderingofa remotedataset. First,a oarse
resolutionrepresentationofthedataisgenerated on-the-yandtransferredtotheloal
visual-izationlient. Next,subregionslosertothepoint-of-interest(inthisase,theameraposition)
arerequestedandintegratedatprogressivelyhigherresolutions.
Besideshierarhialvisualizationmodulesfororthosliingandthedisplayofheightelds,weimplemented
a3Dtexture-basedvolumerenderingmodule forotrees. Theotreeistraversedinaview-onsistent
(bak-to-front)order,startingattherootnode. Anodeisrendered,iftworiteriaarefullled:
•
Thedataforthisnodeisalreadyloaded(otherwise,thetraversaloftheassoiatedsubtreeisstopped).•
The data for the subnodes is not loaded yet (otherwise, the node is skipped and the subnodes arevisited).
Oneanodeisseleted,it isrenderedutilizingthestandardapproahforvolumerenderingwith 3Dtextures,
asproposed in(e.g.) [16,15℄. The3Dtextureis sampledonslies perpendiularto theviewing diretionand
blendedin theframebuer.
In order to take advantage of the multi-resolution struture of the data for fast rendering, the sample
distaneofthesliesissetwithrespettotheresolutionleveloftheatualnode,asproposedin [29℄.
7. Results.
partof the Globus software distributionasof yet. It supports theaddition of ompile-time plugins (written
in C)forhandlingspeiinarnationsoftheERET/ESTO protoolommands. AlthoughERET andESTO
are speiedin theGridFTP protool version1.0, there is urrently no other implementation of this feature
availableotherthanthebasisupportforpartialleaessandstripeddataaess. Therearegoodprospets
for thisfeature to bepresent in various future implementations of GridFTPservers. Thepluginode willbe
availableviatheGridLabprojetsoftwaredistribution,andwillbepublishedathttp://www.gridlab.org/.
Forbenhmarking thesoftwareweused adual Xeon1.7GHzServerrunning RedHat Linux 8.0 asadata
server. Themahinewasequippedwith1GBofRAMandalogialvolumestorageof320GByte(36.5MByte/se
transferrate). Themeasurementshaveagranularityof1seond.
ThevisualizationmoduleswedesribedhavebeenimplementedintheAmiravisualizationenvironment[28,
5℄,whihisbasedonOpenGLand OpenInventor. TherenderingshavebeenperformedonadualPentiumIV
systemwith 2.6 GHz, 1GByte main memoryand NVidia Quadro4 graphis. The systemranunder RedHat
Linux8.0 withthestandardNVidiavideodriver.
7.2. Benhmark Results. Inorder to evaluate ourapproah, weperformed a numberof performane
measurementsforaessing,loadinganddisplayinglargeremoteHDF5datasets. Weomparetheperformane
obtained using the GridFTP plugin (GridFTP HDF5) with a omparable remote aess tehnique, that is
HDF5overGridFTPpartialle aess(GridFTP PFA).Wealso inlude measurementsofloal (loal aess)
andNetworkFileSystem(NFS aess)timesto seeifweahievedourgoalofhavingaeptablewaitingtimes
beforetherstvisualizationisreated,onsideringtheloalandNFStimesasaeptable.
Theresultsofthesetestsarelistedintable7.2. Thetimeneededtoreatetherstimage(
t
3
)isomposedof the time needed to gatherand transferthe meta data (
t
1
) and the time needed to lter and transfer thesubsampledrsttimestep(
t
2
).t
4
givestheaesstimeforafullresolutiontimestep.Thetests havebeenperformedonaLoal Area Network (LAN)with normalnetwork load(lateny1ms,
measured 32.0MBit/se), and on aWide Area Network onnetion (WAN)between Amsterdam and Berlin
(lateny20ms,measuredbandwith: 24.0MBit/se).
TheWAN measurementshavebeen performed withvarious level settings, that iswith dierent depth of
theotreehierarhyreated.
Table7.1
Thetablelists performane measurements forthe various aess tehniqueswe explored.
Theresultshavebeenobtainedbytimingthevisualizationproessfora 32GBHDF5le,
on-taining500 timesteps,eahtimestepwiththeresolutionof
256
3
datapoints(doublepreision).
AessType Net Level MetaData Root Blok Startup Complete
t
1
t
2
t
3
=
t
1
+
t
2
t
4
loalaess - 2 7se 1se 8se 3se
NFSaess LAN 2 8se 5se 13se 8se
GridFTPHDF5 LAN 2 11se 2se 13se 11se
GridFTPPFA LAN 2 165se 10se 175se 200se
GridFTPHDF5 WAN 3 14se 2se 16se 126se
GridFTPHDF5 WAN 2 14se 3se 17se 68se
GridFTPHDF5 WAN 1 14se 7se 21se 45se
GridFTPHDF5 WAN 0 14se 41se 55se 41se
GridFTPPFA WAN 3 430se 28se 458se 3760se
GridFTPPFA WAN 2 430se 53se 483se 960se
GridFTPPFA WAN 1 430se 110se 560se 477se
GridFTPPFA WAN 0 430se 220se 670se 220se
Thesemeasurementsshowthatthegoalofafastinitialvisualrepresentationofthedatasetwasahieved:
a small startuptime
t
3
anbe ahieved by using the GridFTP HDF5 tehniqueombined with hierarhialaess(level
≥
2
). Thistimeisofthesameorderofmagnitudeasforloalvisualization.Speifyingthe hierarhy levelprovides theuserwith aninterativemehanism fortuning response times.
shorterstartuptimesisthetotaltransfertimeforafullyresolveddataset(allotreelevels) 7
. Theresultsshow
thatrelation(
t
3
/t
4
)learlyfortheWAN measurementswithdierentlevelsettings.Also,thelargeoverheadfortheompliated metadata aesswasdramatiallyreduedin omparisonto
GridFTPpartial leaess. Theremainingtime dierenerelativetothe NFSmeta data aessresultsfrom
theappliation ofthezerolterto alldata sets, thetimeneeded towrite themeta datale, and thetimeto
transferit.
8. Conlusions. Withthepresentedshemeforprogressiveremotedataaessanditsuseforhierarhial
rendering, we have suessfully realized the funtionality targeted in our motivating senario (set. 2). In
partiular,thetehniqueswehavedevelopedsupporttheadaptationofremotedata aessto awiderange of
I/Oonnetions, and reat exibly to user and appliation demands. Forexample, ourmehanismssupport
adjustmentofthesystemsreationtimethetimeuntiltherstvisualimpressionforthedatasetappearsby
adaptingdatalterparameters,suhasthehosenotreedepth.
Ourpresentedsolutiondoesnotdepend onserver-sideoinepreproessingoftheomplete dataset. The
aess tothe datasets meta data, when omparedto naiveremoteaess tehniques,oersveryhigh
perfor-mane,assupportedbytheresultsofTable1. Onlyasmallloaldiskstoragespaeisrequiredforahingthe
assoiatedmetadata.
The extensibility of this approah is also notable. This approah supports bothadditional data formats
other thanHDF5, and aess patterns otherthan hyperslab,throughthe provisionof additional plugins.
Si-multaneously,itisimportanttoaknowledgethat thisapproahmaymakeitinreasinglydiulttomaintain
ompatible ongurationson allhosts ofaGrid. The situationmayimprovewithfuture GridFTPserver
im-plementationsallowingdynamilinking andinvoationofplugins. Thusimplementationisoneoftherstfew
existingutilizationsoftheERETapabilitiesprovidedbyGridFTP.Itisexpetedtoseemanymoreinthefuture.
Ourworkfurtherdemonstratestheusabilityofthedataaessshemeforhierarhialrenderingtehniques.
Theimplementedalgorithms(orthoslie,heighteld,volumetrirendering) showverygoodperformane,and
arealsoadaptivetouserspeiationandonnetivityharateristis.
Thepresentedarhitetureenablesustorealizevisualizationsenarioswhihwouldbeimpossibleearlier,by
reduingthetotalamountneededforobtainingavisualdataimpressionbyordersofmagnitudes,ifompared
tonaiveapproahes.
Weareplanningto enhanethedynamiprotoolseletionfeatureofStork,sothat itwillnotonlyselet
any availableprotoolto performthe transfer,but itwill seletthebest one. Therequirementsof `beingthe
best protool'mayvary fromusertouser. Someusersmaybeinterestedinbetterperformane,andothersin
betterseurityorbetterreliability. Eventhedenitionof`betterperformane'mayvaryfromusertouser. We
arelookingintothesemantisofhowtoto dene`thebest' aordingtoeahuser'srequirements.
WearealsoplanningtoaddafeaturetoStorktodynamiallyseletwhihroutetouseinthetransfersand
thendynamiallydeployDiskRoutersatthenodesonthatroute. Thiswillenableustousetheoptimalroutes
inthetransfers,aswellasoptimaluseoftheavailablebandwidththroughoutthatroute.
Aknowledgments. It isapleasureto thankmanyolleaguesandollaboratorswhoontributedtothis
workbothdiretlyandindiretly. AtZIB,thatarenamelyWernerBengerandTinoWeinkauf,whoontributed
tothe overall ideasof ourapproah. Wewish tothanktheGlobus group,inpartiular BillAllokand John
Bresnahan,fortheirsubstantialsupportwiththeGridFTPserverplugininfrastrutureandimplementation
withouttheexperimentalserverprovidedbythem, ourworkwouldhavebeenhardlypossible. Wealsowishto
thankJohnShalfandWerner Bengerformanyinsightfuldisussionsaboutdatahandling. Finally,wewishto
thankthemembersoftheGridLabprojetwhoontributedtotheAdaptiveComponentworkpakageforuseful
disussionsabout(future) semi-automatiadaptivityshemes,andfortheirsupportduringthebenhmarking
Thepresentedwork wasfunded by theGermanResearh Network (the DFN GriKSL projet,grant
TK-602-AN-200),andbytheEuropeanCommunity(the ECGridLabprojet,grantIST-2001-32133).
REFERENCES
[1℄ HDF5 File Format Speiation, National Center for Superomputing Appliations (NCSA).
http://hdf.nsa.uiu.edu/ HDF 5/do /H5 .fo rmat .htm l.
7
[2℄ IntrodutiontoHDF5,NationalCenterforSuperomputingAppliations(NCSA).
http://hdf.nsa.uiu.e du/H DF5/ do /H5. intr o.h tml.
[3℄ WideAreaFileServieandtheAFSExperimentalSystem,UnixReview,7(1989).
[4℄ NetworkProgramming Guide,SunMirosystemsIn.,(1990).RevisionA.
[5℄ AmiraUser'sGuideandRefereneManualandAmiraProgrammer'sGuide,Konrad-Zuse-ZentrumfürInformationstehnik
Berlin(ZIB)andIndeed-VisualConeptsGmbH,Berlin,2001. http://www.amiravis.om/ .
[6℄ Top500,(2002). http://www.top500.org/lis t/20 02/ 11/.
[7℄ VisItUser'sManual,Teh.ReportUCRL-MA-152039,LawreneLivermooreNationalLaboratory,February2003.
[8℄ J.Ahrens,C.Law,W.Shroeder,K.Martin,andM.Papka,AParallelApproahforEientVisualizingExtremely
Large,Time-VaryingDatasets,Teh.ReportLAUR-00-1620,LoaAlamosNationalLaboratory(LANL),2000.
[9℄ W. Allok, J. Bester, J. Bresnahan, S. Meder, P. Plaszzak, and S. Tueke, Gridftp:
Protool extensions to ftp for the grid, GWD-R (Reommendation), (2002). Revised: Apr 2003,
http://www-isd.fnal.go v/gr idft p-w g/dr aft/ Gri dFTP Rev 3.ht m.
[10℄ G.Allen,W.Benger,T.Goodale,H.-C.Hege,G.Lanfermann,A.Merzky,T.Radke,E.Seidel,andJ.Shalf,
The Catus Code: A Problem Solving Environment for the Grid, in Proeedings of the Ninth IEEE International
SymposiumonHighPerformaneDistributedComputing,2000,pp.253260.
[11℄ D.Alpert,SalableMiroSuperomputer,Miroproessor Report,03/17/03-01(2003).
[12℄ W.Benger, I. Foster,J.Novotny,E.Seidel, J.Shalf,W.Smith,andP.Walker,Numerialrelativityina
dis-tributedenvironment,inProeedingsoftheNinthSIAMConfereneonParallelProessingforSientiComputing,1999.
[13℄ W. Benger, H.-C. Hege, A. Merzky,T. Radke, andE. Seidel,EientDistributed File I/O forVisualization in
GridEnvironments,Teh.ReportSC-99-43,ZuseInstituteBerlin,January2000.
[14℄ M.Beynon,R. Ferreira,T.Kur,andJ.Saltz,DataCutter:MiddlewareforFilteringVeryLargeSientiDatasets
onArhivalStorageSystems,inTheEighthGoddardConfereneonMassStorageSystemsandTehnologies/17thIEEE
SymposiumonMassStorageSystems,CollegePark,Maryland,USA,Marh2000.
[15℄ B.Cabral,N.Cam,andJ.Foran,Aelerated volumerenderingandtomographi reonstrutionusingtexturemapping
hardware,in1994SymposiumonVolumeVisualization,A.KaufmanandW.Krueger,eds.,1994,pp.9198.
[16℄ T. Cullip and U. Neumann, Aelerating volume reonstrution with 3D texture mapping hardware, Teh. Report
TR93-027,DepartmentofComputerSieneattheUniversityofNorthCarolina,ChapelHill,1993.
[17℄ I. T. Foster, C. Kesselman, G. Tsudik, and S. Tueke, Aseurity arhiteture foromputational grids, inACM
ConfereneonComputerandCommuniationsSeurity,1998,pp.8392.
[18℄ L. A. Freitag and R. M. Loy, Adaptive, multiresolution visualization of large data sets using a distributed memory
otree, inProeedingsof SC99: High Performane Networkingand Computing, Portland,OR,November 1999, ACM
PressandIEEEComputerSoietyPress.
[19℄ H.-C. Hege and A. Merzky, GriKSLImmersive Überwahung und Steuerung von Simulationen auf entfernten
Superomputern,DFN-Mitteilungen,59(2002),pp.57.
[20℄ F.IsailaandW.F.Tihy,Mappingfuntionsanddataredistributionforparallelles,inProeedingsofIPDPS2002
Work-shoponParallelandDistributedSientiandEngineeringComputingwithAppliations,April2002.FortLauderdale.
[21℄ L.Linsen, J.Gray,V. Pasui, M. A.Duhaineau,B.Hamann, andK. I. Joy, Hierarhial Large-saleVolume
Representationwith'3rd-root-of-2'SubdivisionandTrivariateB-splineWavelets,Mathematis+Visualization,Springer
Verlag,Heidelberg,Germany,2003.
[22℄ A.NortonandA.Rokwood,EnablingView-DependentProgressiveVolumeVisualizationontheGrid,IEEEComputer
GraphisGraphisAppliationsforGridComputing,(2003),pp.2231.
[23℄ C.Nuber,R.W.Brukshen,B.Hamann,andK.I.Joy,Interativevisualizationofverylargedatasetsusingan
out-of-orepoint-basedapproah,inProeedingsoftheHighPerformaneComputingSymposium2003(HPC2003),I.Baniesu,
ed.,SanDiego,California,Marh30April2,20032003,TheSoietyforComputerSimulationInternational.Orlando,FL.
[24℄ S.Olbrih, T. Weinkauf,A. Merzky,H.Knipp, H.-C.Hege, andH.Pralle,Lösungsans�zezur Visualisierung
imHighPerformaneComputingundNetworkingKontext,inZukunftderNetze-DieVerletzbarkeitmeistern.,J.von
Knopand W.Haverkamp,eds., vol.10, Düsseldorf, Germany, May 2002,pp.269279. 16.DFN-Arbeitstagung über
Kommunikationsnetze,GIEdition,LetureNotesinInformatis(LNI).
[25℄ N. Ramakrishnan and A. Y. Grama,Data Mining Appliations in Bioinformatis, inData Mining for Sienti and
EngineeringAppliations,KluwerAademiPublishers,2001,pp.125140.
[26℄ J. Rumble Jr., Publiation and Use of Large Data Sets, Seond Joint ICSU Press - UNESCO Expert Conferene on
EletroniPublishinginSiene,(2001).
http://users.ox.a.uk/ isuinfo/rumbleppr.htm.
[27℄ H.ShumannandW.Müller,Visualisierung,Springer,Berlin,Heidelberg,NewYork,2000.
[28℄ D.Stalling,M.Westerhoff,andH.Hege,Amiraanobjetorientedsystemforvisualdataanalysis,inVisualization
Handbook,C.R.JohnsonandC.D.Hansen,eds.,AademiPress,toappear2003. http://www.amiravis.om /.
[29℄ M. Weiler, R. Westermann, C. Hansen, K. Zimmerman, and T. Ertl., Level-of-detail volume rendering via 3D
textures,inIEEEVolumeVisualizationandGraphisSymposium2000,1994,pp.713.
[30℄ R. Wolski, N.Spring, andJ.Hayes,TheNetworkWeatherServie: ADistributedResoure Performane Foreasting
ServieforMetaomputing,JournalofFutureGenerationComputingSystems,15(1999),pp.757768.
Editedby: WilsonRivera,JaimeSeguel.
Reeived: July5,2003.