• No results found

Progressive Retrieval and Hierarchical Visualization of Large Remote Data

N/A
N/A
Protected

Academic year: 2020

Share "Progressive Retrieval and Hierarchical Visualization of Large Remote Data"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

PROGRESSIVERETRIEVAL ANDHIERARCHICAL VISUALIZATION OFLARGE

REMOTEDATA

HANS-CHRISTIAN HEGE

,ANDREI HUTANU

,RALF KÄHLER

, ANDRÉ MERZKY

, THOMAS RADKE

,

EDWARD SEIDEL

, AND BRYGG ULLMER

Abstrat.

Thesizeofdatasetsproduedonremotesuperomputerfailitiesfrequentlyexeedstheproessingapabilitiesofloalvisualization

workstations. This phenomenoninreasingly limitssientists whenanalyzing results of large-sale sienti simulations. That

problemgetsevenmoreprominentinsientiollaborations,spanninglargevirtualorganizations,workingonommonsharedsets

ofdatadistributedinGridenvironments.Inthevisualizationommunity,thisproblemisaddressedbydistributingthevisualization

pipeline.Inpartiular,earlystagesofthepipelineareexeutedonresoureslosertotheinitial(remote)loationsofthedatasets.

Thispaper presentsan eienttehniquefor plaingthe rsttwostages ofthe visualizationpipeline(data aessand data

lter)ontoremoteresoures. ThisisrealizedbyexploitingtheextendedretrievefeatureofGridFTPforexible,highperformane

aesstovery largeHDF5les. Wereduethenumberofnetworktransationsforlteringoperationsbyutilizingaserverside

dataproessingplugin,andhenereduelatenyoverheadomparedtoGridFTPpartialleaess. Thepaperfurtherdesribes

theappliationofhierarhialrenderingtehniquesonremoteuniformdatasets,whihmakeuseoftheremotedatalteringstage.

1. Introdution. The amount of data produed by numerial simulationson superomputing failities

ontinuestoinreaserapidly inparallel withtheinreasingomputepower,mainmemory,storagespae,and

I/Otransferratesavailabletoresearhers. Thesedevelopmentsinsuperomputinghavebeenobservedtoexeed

thegrowthofommoditynetwork bandwithandvisualizationworkstationmemory/performanebyafatorof

4[11℄. Hene, itis inreasingly ritialto use remotedata aess tehniques foranalyzing this data. Among

otherfators,thistendenyisstrengthenedbytheinreasingpromineneoflarge,spatiallydistributedsienti

ollaborationsworkingonommon,sharedsetsofdata. Undertheseonditions,thesimpleapproahof(partial)

datarepliationforloaldataanalysisdoesnotsale.

Thesheersizeofexistingdatasetsreatesademandforexibleandadaptivevisualizationtehniques,suh

ashierarhialrenderingorviewpointdependent resolution. Suh tehniques anredue theinitial amountof

data to bevisualized bymaintaining the overall visual impression of the full data set. This anbeahieved

(e.g.) byretrievingtheportionsofthedatasetwhihareimportanttotheuser;orbyretrievinglowresolution

versions ofthefull dataset rst,and reningthis datalater. Remoteaess topartial interesting portions of

largedatalesansigniantlysupportthesetehniques.

Onemajorproblemofnaiveremotedataaesstehniquesistheinherentdiultyinhandlingmeta data

forlarge data sets. Meta data is thehighly strutured set of informationdesribing the dataset, ontaining

(e.g.) the numberof samples peroordinate axisand thedata volume bounds within physial spae. While

the metadataitself is relatively small, meta data aess is oftenonneted with many small read operations

andmanyseekoperations. However,individuallyrequestingmanyseeksoveraremote,potentiallyhigh-lateny

onnetionisquiteineientforprotoolsthatdonotsupporttransationsoverhigherleveloperations[13,19℄.

Ingeneral,thesedevelopmentsultimatelyrequiredistributingthepipelineusedfordatavisualization. The

presentpaperdesribestehniquesuseablefordistributingearlystagesofthisvisualizationpipeline. Speially,

we enable the appliation to eiently aess portions of remote large data sets present in the HDF5 le

format[2℄. Thisgeneral approah anbe adapted both to other le formatsand other aess patterns. The

paperfurtherpresentshigherlevelvisualizationtehniqueswhihutilizethesedataaessmehanismstoprovide

adaptiveandprogressiverenderingapabilities.

Thepaperisstruturedasfollows. First,wedesribetheproblemspaeourapproahistargetinginmore

detailinset.2. Next,werelateourresearhtootherrelevantresearhativities(seeset.3). Inset.4follows

an overall desription of the tehniqueswe developed. Set. 5and 6 desribe the main omponents in more

tehnialdetail. Thepaperonludes withtwosetionsaboutourresultsandanoutlookforfuture work.

2. Senario. Theinreasinggapbetweenresouresavailableatremotesuperomputingentersandonthe

loalworkstationsofindividualresearhersisoneofthemajormotivationsforourresearh. Inpartiular,weaim

toimprovetheaess toGrandChallengesimulationresultsasprodued bynumerous researhollaborations

ZuseInstituteBerlin(ZIB),http://www.zib.de/,{hege, hutanu, kaehler, merzky, ullmer}zib.de

(2)

aroundtheworld[12,25,26℄. Thesesimulationstendtodrivetheresoureutilizationofsuperomputerresoures

totheavailablemaximum,andoftenprodueimmenseamountsofdataduringsinglesimulationsruns.

Asanexemplaryappliationweonsider numerialrelativitysimulationsperformedintheCatus

simula-tion framework [10℄. Amongother things, this framework provides the simulationode with aneientI/O

infrastruture to write data to HDF5 les. The astrophysial simulationsin questionwrite data for salar,

vetorand tensor elds (as omponents stored in separate data sets or les), and parameters for simulation

runs. Atypialsize foradataleisontheorder oftensofgigabytes 1

.

Visualization ofthis data duringpost simulationanalysis usually doesnotrequireaess tothe omplete

data set. Fortypialprodution runs,where many dierent physial elds arewritten to disk, onlyaouple

ofthese eldsarevisualizedlater. Thedatasetsandsubsetsthat aretobevisualizedarenotinitiallyknown,

but depend oninterativeseletionsbytheuser(timestep,eld, resolution, spatial area,et.). Forourtarget

users,thisexibilityneedstobemaintainedasfaraspossible.

Withintheseonstraints,ourtargetsenarioisthefollowing:

A sientist performs a large sale simulation run, utilizing one or more superomputing

re-soures at dierent loations. The simulation run produes up to TBytes of data, by storing

various salar and vetor elds to HDF5 les. These HDF5 les are reated aording to a

ustompredenedstruture.

Afterthesimulationnishes,membersofthesientists'ollaborationwishtovisualizethedata,

or portions hereof, from remote workstations. They would like to use standard visualization

tehniques from their visualization environment. They also wish to interatively hoose the

data eldsto bevisualized, and tointeratively hange the spatial seletion andresolutionfor

the data.

Ideally, the data transfer and visualizationare adaptive to the available network onnetivity,

andhidesdatadistributiondetails from theuser.

This senario denes the problem spae we are targeting. We expliitely do not expet to nd data on the

remotesystemswhihare,bypre-orpostproessing,speiallypreparedforlatervisualization. Wealsowant

toprovideasolutionforenvironmentswithnotoriousshortsupplyofI/Obandwithandomputeresoures. And

wewanttoenableremotevisualizationforabroadwidth ofendusers,onnetedtotheGridby awiderange

of network types and with varying,potentiallylow end ommodity systems. The ability of the visualization

pipelineto beadaptive tothat rangeofboundaryonditionsisaentral pointofoureortsthefousofthe

paperonprogressivedataretrievalpatternsandonhierarhialrenderingtehniquesemphasizesthis.

3. Related Work. To support the senario we presented, it is ultimately neessary to distribute the

pipelineusedfordatavisualization. Inpriniple,therearemanypossiblewaystodistributethispipeline(g.3.1)

overremoteresoures. Thedistributionshemesusedin realworldsystemsarelimitedbytheommuniation

requirementsfor transferringdata betweenthe stages of thepipeline, and bythe omplexity of theresulting

distributed softwaresystems.

Data

Source

Data Set

Image

filter

map

render

display

access

view

Geometry

user control

Fig.3.1.Mostvisualizationsystemssharethesameunderlyingvisualizationpipeline[27 ℄.

Theomponents of the pipeline an be freely distributed, in priniple,as the ommuniation

elementsbetween theseomponents havedierent demands onlateny and bandwith required.

Allelementsofthepipelineshouldbeontrolled bytheenduserorbytheappliation.

Earlystagesof thepipelineremoteaessand remotelteringpotentiallyneedto transferand proess

largeamountsofdata,butshowonsiderableexibilitywithrespettolateny. Also,bydistributingtheseearly

1

(3)

stages,itispossibletoompletelyhidethedataloalityfromappliationandenduser. Remoteaesssolutions

asNFS[4℄andAFS [3℄ allowtransparentutilizationofstandard(loal)leI/Otehniques. However,systems

like NFS and AFS are problemati in the administrativemaintenane. For widely distributed environments

spanningmultipleadministrativedomainsthese solutionsarenotappliable.

Commonremote data aesstehniquesrossing administrativeboundariesaremarkedby several

limita-tions. Some,likeSCPandFTP,donotsupportaesstopartialles,whihisnotaeptableforourpurposeof

adaptivevisualizing. Othertehniquesfailtodelivertheperformanerequiredforinterativedatavisualization.

Forexample,theGridFTPsupportforaesstoremoteleswiththepartialleaessfeature[9℄isineient

formetadataaess. DuetotheleformathosenbyHDF5,metadataisnotneessarilystoredinaontinuous

lespae, but insteadsatteredin ahierarhialbinarytree. Also,asingleread ontheHDF5 APIlevelmay

betranslatedbythelibraryintomanyindividuallow-levelseek/readoperationsonthevirtualledriverlevel.

Otherprotoolsaresimilarly lakingin supportfortransationsofhigherleveloperations[13,19℄.

Remotelteringtehniquesoftenintegratemodelsofmeta dataanddatastrutures,and anperformthe

dataaesseiently 2

. Also,puttingtheremotelterontheremotesiteansigniantlyreduetheamountof

datatobetransferredoverthenet,andensuresthatonlythedataatuallyneededforthevisualizationproess

isretrievedand transferred. A standardproblemfor remoteltering isthat this proess needsto integratea

model of the datastrutures it is operating upon. It is diultorimpossibleto implementltering without

expliitinformation about what is to be ltered, andthis information is diult to express in ageneralway

thatisappliableoverabroadrangeofdataformatsandmodels. Hene,remotelteringtehniques areoften

limitedto spei leanddatatypes,and tospei lteringoperations.

The Data Cutter projet [14℄ is another well known representative of the remote ltering approah. It

provides the appliation programmer with a exible and extensible lter pipeline to aess portions of the

originaldataset. Comparedtoourapproah,thereareseveralmaindierenes. First,thedatautterrequires

thedata to be stored in hunked data lesin order to benet from its boundarybox indexing sheme, sine

allhunks with abounding boxat least partlyoverlappingwiththe areaof interestare ompletelyread into

memory, and passed to thelter pipeline. Also, sine alllters passdata using network ommuniation, the

total network loadis muh higher than for our approah, where the lterresides at the data soure, and is

tightly oupledto the data aess stage. Further, ourutilization of standardGrid tools(GridFTP andGSI)

seemsmoreappropriate for thetargeted Grid environment. On theother hand, Data Cutters userdenable

lterpipelineismoreexiblethanourapproah.

Onewidelyused ompromiseforremoteltering istheusage ofpreproessed datasets: during the

simu-lationsI/Ostage orduring apostproessing step,lteroperationsareappliedto reatenewdata setsonthe

remoteresoures. These datasets are stored in optimizedform makinglater remoteaess and visualization

very eient. In the future, moreand more simulation frameworks will support suh features, not at least

in order to improve their own I/O harateristis, i.e. due to ompression on the y, but also to enable the

eient handlingof theverylargedata sets, after ompletion of thesoure simulation. Wavelet transformed

data storage is an exellent example of that tehnique [22℄, whih allows lossless ompression, and adaptive,

eient oine aes to optimally resolveddata samples. Other examplelters reate otrees [18℄ or similar

struturedrepresentations[21℄,orprovideprogressivemeshgeneration.

For theproblem spae wedesribed with oursenario, pre applied lters are novalid option,sine they

either need to beintegrated into the simulation I/O ode, what they aren't in our ase; orthey need to be

exeutedviaexternaljobsontheremoteresoure. Thisdupliatesthestorageneededandpotentiallyperforms

exesswork,therebywasting ostlysuperomputingresoures.

Afterltering,visualizationalgorithmsworkonthedataandmapessentialfeaturesintogeometries

(inlud-ingolorandtextureinformation, et.). Thenextstage rendersimagesfrom thesegeometrialrepresentation.

Inthe future, these stages may also be exeuted loseto thedata soure, on the superomputer itself. This

would be the most eient way to handle large simulation data, sine the amount of data to be transfered

duringthelaterstagesofthevisualizationpipelinetypiallydereasessigniantly. Completelyhangedaess

patternstoremotedataansigniantlyreduetheamountofdatatransfered. Visualizationalgorithmsusing

suh patterns[23℄,inpartiularforlargedata,areseenasuseasesforthepresentedwork.

ThebestprospetsofdeployingsuhsenarioshavethoseenvironmentsontainingPC-lusterbased

super-omputers. Here,addingommoditygraphisboardstoallnodesdoesnotinreasethetotalostssigniantly,

(4)

but allowshigh performaneimage rendering. These typesoflustersarebeominginreasinglyommon,but

arestill rarein thetop500 [6℄. Fortheollaborativeandhighly interativevisualizationsenarioweenvision,

the feedbak to theremote and distributed rendering systemgets important,and omplex. Also, in perhaps

the mostimportant point, the eld is urrentlymissing suientlyexible software solutions whih are able

torealizesuhsenarios. Promisingapproahesdoexistthroughwork suh as[8,7,24℄,and weexpetmajor

progressin thateld overthenextdeade.

4. Arhiteture. Our proposed remote data aess sheme builds upon the GridFTP protool [9℄.

GridFTPisaGrid-awareextensiontothestandardFTPprotool. Amongstothers,itprovidesaexibleserver

side proessing feature, and allowsspeiation of ustom operations on remote data. These operations are

performedbyorrespondingustomextensions(plugins) totheGridFTPserver. Thistehniqueisdesribed

inmoredetailinset.5. Weutilizetheseserversidedataproessingapabilitiestoperformdataltering

oper-ationsonthesientidatasets. Asdesribed, thedatasets arestoredremotelyin HDF5format. Ourplugin

totheGridFTPserveraessesthisdataloallyviatheHDF5library,andperformsdatalteringonthey.

HDF5

File

Visualization System

(Amira)

GridFTP Client

Network

HDF5 Plugin

− MetaData

− Data Blocks

MetaData

Cache

Native

HDF5 Calls

GridFTP Protocol (ERET/ESTO)

GridFTP Server

Fig.4.1.TheGridFTPprotooltransportsERETommandsfromthevisualizationsystem

to theGridFTP server,whih forwards them totheHDF5 plugin. This way, theplugin an

performI/OoperationspluslteringanddatatypeonversionontheHDF5le withfullloal

performane. Datais transferred bak via ESTOommands, and iswrittenintothe memory

buerofthevisualizationproess.

An important element forthe arhitetural deisionsis the usageof theHDF5 le format[1℄. Given the

omplexityofthisformatandtheongoingimprovementeortsonerningtheassoiatedAPI,thedeisionwas

tousetheexisting APIandtohavetheremoteaessproedureseitherontopoftheAPIorasalsodesribed

in set. 3 underneath of it. The arhiteturedesribedin this work has theremote operations on topof the

HDF5API,alimitedset ofhigh-leveloperationswashosento beimplementedbymakinguseof theexisting

API,andtheseoperationswereintegrated intheGridFTPservertobeexeutedattheremotesite.

A omplete visualization session is performed asfollows. Theuser selets adata le to be visualizedby

browsing the remote le spae. Next, a onnetion to the remote GridFTP serveris established, using the

usersGSIredential. Theserverpluginisutilizedto performanextrationofthelesmetadata(seeset.5),

whihisthentransferedtothevisualizationhostandahedontheloallesystem. Thevisualizationsystem

aessesthisloalHDF5le,extratsallneededinformation(numberoftimesteps,bounding box,resolution,

...), and reatesan otree hierarhy ttingthe dataset. Theuseraninterativelyspeify thedepth of the

hierarhy. Astheuserthentriggersvariousvisualizationoperationsonthedata(to produeorthoslies,hight

elds, volumetrirenderings),theotreebloksaresheduledin aseparatethreadfor datareading. Theread

requestsare servedaordingtoaprioritytagdened forthevisualization, andeahtriggeraGridFTPdata

aess. This GridFTPdataaessutilizesourremoteGridFTPserversidedataproessingplugin. It extrats

thedatain theblokspeiresolutionandreturnsthisdata. Onarrival,thedataisstoredwithin theotree

hierarhy, and the visualization is triggered to update the rendering by inluding the newly arrived data.

On userrequest (e.g., nexttimestep) ortimeout, allpending blokreads anbe aneled. Our visualization

tehniques(see set. 6) use these features for dynami data aess to optimize visualization performane by

requestingdatablokslosetotheviewpointrst,andbyprogressivelyimprovingdata(andimage)resolution.

(5)

transfer. This approah givesus a numberof advantages if ompared to approahes implemented on top of

ustomorproprietaryprotools.

1. GridFTPallowsforserversidedataproessing,whihweutilizefordataltering.

2. TheGridFTPprotool,asanextensiontothestandardFTPprotool,iswellknownandreliable.

3. Itallowstheinorporationofstandardserversforsolutionswithlimitedfuntionality. 3

4. TheGridFTPinfrastruturetakesareof:

establishingthedataonnetion;

ensuringauthentiationandauthorization;

invokingthedatalterplugin;and

performingthedatatransfer;

Inthis way, thedata transfertask is redued to lling abuer onthe writingand reading it on the

reeivingend.

Thefollowingsubsetionsdesribetheserversideproessinginmoredetail,andspeifythelowleveloperations

weuse.

5.1. Server-Side Proessing. As desribed before, the GridFTP protool enables support for adding

ustomommands for serverside data proessing [9℄. Speially, the pluginsoered bya serverdene sets

of ERETand ESTO parametersthat orrespond to thedata ltermodule implemented bythe plugin 4

. The

extended store (ESTO) and extended retrieve (ERET) ommands of the GridFTP protool are dened as

following:

ESTO <module_name>="<modules_parms>" <filename>

ERET <module_name>="<modules_parms>" <filename>

module_nameis a server-spei stringrepresenting the name of the module to be used. The seond string

(module_parms)ismodulespeianddenestheoperationtobeperformedbythemodule. Thelastparameter

(filename)speiesthele tobeproessed,whihanbeanylethatanbeproessedbythegivenmodule.

Inourase,anyHDF5le.

5.2. Operations. Weusethis ERET/ESTO mehanismtodene twooperationsthatanbeappliedto

HDF5les: oneformetadataltering,andaseond onefordataaess.

MetaDataFiltering. TherstoperationisthelteringofmetadatafromtheHDF5le. Thisisahieved

byreatingalteredopyof theoriginal le. Towardthis end, themodule reads andparses theoriginalle,

andwritesthemetadatainformationtoaopyofthele. However,whenopying(writing)adataset,weuse

theHDF5lterinterfaeandapplyaltertotheoriginallesdataset. Thislterreduesalldatasetstozero

length 5

. Thus, the only resulting dierenes betweenthe generated le and the original one are in the data

arrayand storagelayoutofthedata sets. All otherinformatione.g.,the hierarhy(groups), attributes,and

dataset information (name,data typeanddata spae)ispreserved. Whilethis approah mightseemlikea

signiantoverhead,itisinfatveryfast,dueto thegoodperformaneofHDF5.

ThegeneratedleistransferredtotherequestinglientusingGridFTP.TheERETommandforrequesting

themeta dataleis:

ERET Hdf5="METADATA" <filename>

filenameis thele from whih themeta data will be extrated. Giventhenowdramatiallyredued size of

thele, thetransfertimeis verysmall relativeto thetransfertime oftheoriginal data 6

. Afterthehigh-level

ltering all is exeuted remotely and the transfer is nished, the lient an aess the loal meta data le

using the standardHDF5 API. In this way, weavoid to exeute eah HDF5 APIall remote, and still oer

the userthe exibility of the original APIfor meta data aess. Beause the data set strutures within this

temporary loal le do not ontain atual data, the standard API annot be used for data aess. Forthis

task,weprovideaseondAPIall.

3

bakwardsompatiblewithFTP,byusingnormalFTPweouldtransfertheletoaloaldiskahe;forstandardGridFTP

server(withoutplugins)weusediretpartialleaess(ERET PART,forlteringineient).

4

Notallserversimplementthesamesetofmodules.Intheurrentimplementation,thepluginsareompiledtogetherwiththe

server,andarestatiallylinked.

5

Atually,fortehnialreasonsinternaltoHDF5thelengthis1.

(6)

Data Set Readingand Subsampling. Theseond operationperformsdataseletionandltering. By

knowingthedataset oordinates (dimensions,data type)from thenowloally available metadata,the lient

anhoose to read an entire data set,or aportion of the data set. TheHDF5 data sets logially groupthe

atualdatawithin multidimensionalarraysnamed dataspaes. Themodelweuse tospeifyaportionfrom

a data set is based on the HDF5hyperslab model. A hyperslabdesribes either a ontiguousolletion of

points,oraregularpatternofpointsorbloksinthedataspaes. Ahyperslabisspeiedbyfourparameters:

origin: thestartingloation;

size: thenumberofelements(or bloks)toseletalongeahdimension;

stride: thenumberofelementstoseparateeah element(orblok)tobeseleted;and

blok: thesizeofeahblokseletedfromthedataset.

All ofthese parametersareone-dimensionallists, withlengthsequalto thenumberofdimensionsof thedata

set. Theelementsofthese listsspeify dataarraylengthsorosetsfororrespondingdimensionsofthe data

arrays.Currentlythesizeofelementbloksispredenedtoone,whihisadequateforthetargetedvisualization

senario. Infuturework,wewillextendtheprotooltoaeptvariablebloksizes.

Oururrentmehanismforspeifyingthehyperslaboordinatestakesthefollowingform:

ERET Hdf5="BLOCK:NAME=<datasetname>;\

DIMENSIONS=<dims>;\

ORIGIN=<orig0>,<orig1>,...,<orign>;\

SIZE=<size0>,<size1>,...,<sizen>;\

SAMPLING=<sampling0>,<sampling1>,...,<samplingn>"

<filename>

datasetnameisthe fullyqualied name (inludingthepathto the dataset) of thedata set from whih data

shouldberead;orig0toorign aretheoordinatesoftherstelementtobeseletedfromthedataset;size0

tosizen arethenumberofelementstobeseletedineahdimension;andsampling0tosamplingnrepresent

thedistanebetweentwoseletedelementsforeahdimension.

This requestissentto theserver. The serveropensthele lename, opensthegivendataset,and reads

theportionof thelespeiedbythegivenparameters. ThisproedureisperformedvianativeHDF5library

alls. Next,the retrieveddata is sent viathe GridFTPonnetion to the lient, whih will onvert the data

to the loal byte order ifneeded. Todetermineif onversionis neessary, therst 32bits sentby the server

representanintegerwiththevalueof 1,enodedusingtheserversbyteordering.

TheapproahwehavetakeninreatingthislimitedHDF5APIwrapperdoesreduetheexibilityprovided

bytheoriginal API.Nonetheless,for ourvisualizationsenariothisAPIis appropriate,andmakessigniant

stepstowardmaximizingoverallperformane. ToretaintheexibilityoftheoriginalAPI,oneapproahwould

betoexeuteeahnativeAPIallremotely. Inthisase,theostperallisatleastthatofthenetworklateny.

This,ombinedwith therelativelylargenumberofalls neededfor exampleto gatherthemeta datafromthe

le, signiantly reduestheperformane. This motivatestheusage ofhigherlevelAPIwrappers,astheone

wehaveimplemented. However,suhwrappersneednottobeaslimitedasoururrentversionofourse.

5.3. Seurity. The seurity model used used by the GridFTP serveris GSI (Grid Seurity

Infrastru-ture) [17℄. The lient needs to hold avalid GSI proxyontaininga seurity redentialwith limitedvalidity.

The proxy represents a Distinguished Name (DN) that must be present in the grid-mapfileof the server

mahine inorderfortheservertoaepttheonnetion. Thisproxyisusedto authentiatethelientwithout

using passwords. After the onnetion is established, the server front end starts the MPI-based bak end.

This bak end runs under theloal identityto whih theDN is mapped. Thebak end is responsiblefor all

subsequentoperations,inluding thelteringoperations. Thisensuresthat onlyauthorizedlientsanaess

theinformationfromtheoriginalle.

6. Adaptive Visualization. Weutilizethepreviouslydesribedtehniquesfordata aessandltering

togeneratealevel-of-detailrepresentationoftheremotedataset inthevisualizationphase.

First,themetadatai.e. informationaboutthenumberofdatasamplesperoordinateaxisandthedata

volumeextensioninphysialspaeisretrieved(seeset.5.2). Withhelp ofthisinformation,andaseletable

minimalresolutionofthedata,anotreestrutureisgenerated,whihinitiallyontainsnodataotherthanthe

parent-hildrelationsand position andextensions ofthetreenodes. Therootnodeofthestruture willstore

(7)

Next,the data forthe otreenodes isrequested from the readermodule, startingat theroot node. The

order in whih nodes are rened is determined by the distane from a user-dened point-of-interest, whih

mightbetheameraposition oranarbitrarypointwithinthedatavolume. Subregionsof thedatasetsloser

tothispointarerequestedwithhigherprioritythanthosewhiharefurtheraway. Thepositionandresolution

parametersforeahrequestarespeiedandsenttotheremotemahineasdesribedinset.5.2.

Thereaderrunsinaseparatethread,sothevisualizationroutinesarenotblokedduringtheloadingphase.

Eah timeadatablokhas arrived,thevisualization module isnotied, and thisnew datais reetedin the

nextrenderedframeofthevisualization.

Fig.6.1. Thesequenedepitsthevolumerenderingofa remotedataset. First,a oarse

resolutionrepresentationofthedataisgenerated on-the-yandtransferredtotheloal

visual-izationlient. Next,subregionslosertothepoint-of-interest(inthisase,theameraposition)

arerequestedandintegratedatprogressivelyhigherresolutions.

Besideshierarhialvisualizationmodulesfororthosliingandthedisplayofheightelds,weimplemented

a3Dtexture-basedvolumerenderingmodule forotrees. Theotreeistraversedinaview-onsistent

(bak-to-front)order,startingattherootnode. Anodeisrendered,iftworiteriaarefullled:

Thedataforthisnodeisalreadyloaded(otherwise,thetraversaloftheassoiatedsubtreeisstopped).

The data for the subnodes is not loaded yet (otherwise, the node is skipped and the subnodes are

visited).

Oneanodeisseleted,it isrenderedutilizingthestandardapproahforvolumerenderingwith 3Dtextures,

asproposed in(e.g.) [16,15℄. The3Dtextureis sampledonslies perpendiularto theviewing diretionand

blendedin theframebuer.

In order to take advantage of the multi-resolution struture of the data for fast rendering, the sample

distaneofthesliesissetwithrespettotheresolutionleveloftheatualnode,asproposedin [29℄.

7. Results.

(8)

partof the Globus software distributionasof yet. It supports theaddition of ompile-time plugins (written

in C)forhandlingspeiinarnationsoftheERET/ESTO protoolommands. AlthoughERET andESTO

are speiedin theGridFTP protool version1.0, there is urrently no other implementation of this feature

availableotherthanthebasisupportforpartialleaessandstripeddataaess. Therearegoodprospets

for thisfeature to bepresent in various future implementations of GridFTPservers. Thepluginode willbe

availableviatheGridLabprojetsoftwaredistribution,andwillbepublishedathttp://www.gridlab.org/.

Forbenhmarking thesoftwareweused adual Xeon1.7GHzServerrunning RedHat Linux 8.0 asadata

server. Themahinewasequippedwith1GBofRAMandalogialvolumestorageof320GByte(36.5MByte/se

transferrate). Themeasurementshaveagranularityof1seond.

ThevisualizationmoduleswedesribedhavebeenimplementedintheAmiravisualizationenvironment[28,

5℄,whihisbasedonOpenGLand OpenInventor. TherenderingshavebeenperformedonadualPentiumIV

systemwith 2.6 GHz, 1GByte main memoryand NVidia Quadro4 graphis. The systemranunder RedHat

Linux8.0 withthestandardNVidiavideodriver.

7.2. Benhmark Results. Inorder to evaluate ourapproah, weperformed a numberof performane

measurementsforaessing,loadinganddisplayinglargeremoteHDF5datasets. Weomparetheperformane

obtained using the GridFTP plugin (GridFTP HDF5) with a omparable remote aess tehnique, that is

HDF5overGridFTPpartialle aess(GridFTP PFA).Wealso inlude measurementsofloal (loal aess)

andNetworkFileSystem(NFS aess)timesto seeifweahievedourgoalofhavingaeptablewaitingtimes

beforetherstvisualizationisreated,onsideringtheloalandNFStimesasaeptable.

Theresultsofthesetestsarelistedintable7.2. Thetimeneededtoreatetherstimage(

t

3

)isomposed

of the time needed to gatherand transferthe meta data (

t

1

) and the time needed to lter and transfer the

subsampledrsttimestep(

t

2

).

t

4

givestheaesstimeforafullresolutiontimestep.

Thetests havebeenperformedonaLoal Area Network (LAN)with normalnetwork load(lateny1ms,

measured 32.0MBit/se), and on aWide Area Network onnetion (WAN)between Amsterdam and Berlin

(lateny20ms,measuredbandwith: 24.0MBit/se).

TheWAN measurementshavebeen performed withvarious level settings, that iswith dierent depth of

theotreehierarhyreated.

Table7.1

Thetablelists performane measurements forthe various aess tehniqueswe explored.

Theresultshavebeenobtainedbytimingthevisualizationproessfora 32GBHDF5le,

on-taining500 timesteps,eahtimestepwiththeresolutionof

256

3

datapoints(doublepreision).

AessType Net Level MetaData Root Blok Startup Complete

t

1

t

2

t

3

=

t

1

+

t

2

t

4

loalaess - 2 7se 1se 8se 3se

NFSaess LAN 2 8se 5se 13se 8se

GridFTPHDF5 LAN 2 11se 2se 13se 11se

GridFTPPFA LAN 2 165se 10se 175se 200se

GridFTPHDF5 WAN 3 14se 2se 16se 126se

GridFTPHDF5 WAN 2 14se 3se 17se 68se

GridFTPHDF5 WAN 1 14se 7se 21se 45se

GridFTPHDF5 WAN 0 14se 41se 55se 41se

GridFTPPFA WAN 3 430se 28se 458se 3760se

GridFTPPFA WAN 2 430se 53se 483se 960se

GridFTPPFA WAN 1 430se 110se 560se 477se

GridFTPPFA WAN 0 430se 220se 670se 220se

Thesemeasurementsshowthatthegoalofafastinitialvisualrepresentationofthedatasetwasahieved:

a small startuptime

t

3

anbe ahieved by using the GridFTP HDF5 tehniqueombined with hierarhial

aess(level

2

). Thistimeisofthesameorderofmagnitudeasforloalvisualization.

Speifyingthe hierarhy levelprovides theuserwith aninterativemehanism fortuning response times.

(9)

shorterstartuptimesisthetotaltransfertimeforafullyresolveddataset(allotreelevels) 7

. Theresultsshow

thatrelation(

t

3

/

t

4

)learlyfortheWAN measurementswithdierentlevelsettings.

Also,thelargeoverheadfortheompliated metadata aesswasdramatiallyreduedin omparisonto

GridFTPpartial leaess. Theremainingtime dierenerelativetothe NFSmeta data aessresultsfrom

theappliation ofthezerolterto alldata sets, thetimeneeded towrite themeta datale, and thetimeto

transferit.

8. Conlusions. Withthepresentedshemeforprogressiveremotedataaessanditsuseforhierarhial

rendering, we have suessfully realized the funtionality targeted in our motivating senario (set. 2). In

partiular,thetehniqueswehavedevelopedsupporttheadaptationofremotedata aessto awiderange of

I/Oonnetions, and reat exibly to user and appliation demands. Forexample, ourmehanismssupport

adjustmentofthesystemsreationtimethetimeuntiltherstvisualimpressionforthedatasetappearsby

adaptingdatalterparameters,suhasthehosenotreedepth.

Ourpresentedsolutiondoesnotdepend onserver-sideoinepreproessingoftheomplete dataset. The

aess tothe datasets meta data, when omparedto naiveremoteaess tehniques,oersveryhigh

perfor-mane,assupportedbytheresultsofTable1. Onlyasmallloaldiskstoragespaeisrequiredforahingthe

assoiatedmetadata.

The extensibility of this approah is also notable. This approah supports bothadditional data formats

other thanHDF5, and aess patterns otherthan hyperslab,throughthe provisionof additional plugins.

Si-multaneously,itisimportanttoaknowledgethat thisapproahmaymakeitinreasinglydiulttomaintain

ompatible ongurationson allhosts ofaGrid. The situationmayimprovewithfuture GridFTPserver

im-plementationsallowingdynamilinking andinvoationofplugins. Thusimplementationisoneoftherstfew

existingutilizationsoftheERETapabilitiesprovidedbyGridFTP.Itisexpetedtoseemanymoreinthefuture.

Ourworkfurtherdemonstratestheusabilityofthedataaessshemeforhierarhialrenderingtehniques.

Theimplementedalgorithms(orthoslie,heighteld,volumetrirendering) showverygoodperformane,and

arealsoadaptivetouserspeiationandonnetivityharateristis.

Thepresentedarhitetureenablesustorealizevisualizationsenarioswhihwouldbeimpossibleearlier,by

reduingthetotalamountneededforobtainingavisualdataimpressionbyordersofmagnitudes,ifompared

tonaiveapproahes.

Weareplanningto enhanethedynamiprotoolseletionfeatureofStork,sothat itwillnotonlyselet

any availableprotoolto performthe transfer,but itwill seletthebest one. Therequirementsof `beingthe

best protool'mayvary fromusertouser. Someusersmaybeinterestedinbetterperformane,andothersin

betterseurityorbetterreliability. Eventhedenitionof`betterperformane'mayvaryfromusertouser. We

arelookingintothesemantisofhowtoto dene`thebest' aordingtoeahuser'srequirements.

WearealsoplanningtoaddafeaturetoStorktodynamiallyseletwhihroutetouseinthetransfersand

thendynamiallydeployDiskRoutersatthenodesonthatroute. Thiswillenableustousetheoptimalroutes

inthetransfers,aswellasoptimaluseoftheavailablebandwidththroughoutthatroute.

Aknowledgments. It isapleasureto thankmanyolleaguesandollaboratorswhoontributedtothis

workbothdiretlyandindiretly. AtZIB,thatarenamelyWernerBengerandTinoWeinkauf,whoontributed

tothe overall ideasof ourapproah. Wewish tothanktheGlobus group,inpartiular BillAllokand John

Bresnahan,fortheirsubstantialsupportwiththeGridFTPserverplugininfrastrutureandimplementation

withouttheexperimentalserverprovidedbythem, ourworkwouldhavebeenhardlypossible. Wealsowishto

thankJohnShalfandWerner Bengerformanyinsightfuldisussionsaboutdatahandling. Finally,wewishto

thankthemembersoftheGridLabprojetwhoontributedtotheAdaptiveComponentworkpakageforuseful

disussionsabout(future) semi-automatiadaptivityshemes,andfortheirsupportduringthebenhmarking

Thepresentedwork wasfunded by theGermanResearh Network (the DFN GriKSL projet,grant

TK-602-AN-200),andbytheEuropeanCommunity(the ECGridLabprojet,grantIST-2001-32133).

REFERENCES

[1℄ HDF5 File Format Speiation, National Center for Superomputing Appliations (NCSA).

http://hdf.nsa.uiu.edu/ HDF 5/do /H5 .fo rmat .htm l.

7

(10)

[2℄ IntrodutiontoHDF5,NationalCenterforSuperomputingAppliations(NCSA).

http://hdf.nsa.uiu.e du/H DF5/ do /H5. intr o.h tml.

[3℄ WideAreaFileServieandtheAFSExperimentalSystem,UnixReview,7(1989).

[4℄ NetworkProgramming Guide,SunMirosystemsIn.,(1990).RevisionA.

[5℄ AmiraUser'sGuideandRefereneManualandAmiraProgrammer'sGuide,Konrad-Zuse-ZentrumfürInformationstehnik

Berlin(ZIB)andIndeed-VisualConeptsGmbH,Berlin,2001. http://www.amiravis.om/ .

[6℄ Top500,(2002). http://www.top500.org/lis t/20 02/ 11/.

[7℄ VisItUser'sManual,Teh.ReportUCRL-MA-152039,LawreneLivermooreNationalLaboratory,February2003.

[8℄ J.Ahrens,C.Law,W.Shroeder,K.Martin,andM.Papka,AParallelApproahforEientVisualizingExtremely

Large,Time-VaryingDatasets,Teh.ReportLAUR-00-1620,LoaAlamosNationalLaboratory(LANL),2000.

[9℄ W. Allok, J. Bester, J. Bresnahan, S. Meder, P. Plaszzak, and S. Tueke, Gridftp:

Protool extensions to ftp for the grid, GWD-R (Reommendation), (2002). Revised: Apr 2003,

http://www-isd.fnal.go v/gr idft p-w g/dr aft/ Gri dFTP Rev 3.ht m.

[10℄ G.Allen,W.Benger,T.Goodale,H.-C.Hege,G.Lanfermann,A.Merzky,T.Radke,E.Seidel,andJ.Shalf,

The Catus Code: A Problem Solving Environment for the Grid, in Proeedings of the Ninth IEEE International

SymposiumonHighPerformaneDistributedComputing,2000,pp.253260.

[11℄ D.Alpert,SalableMiroSuperomputer,Miroproessor Report,03/17/03-01(2003).

[12℄ W.Benger, I. Foster,J.Novotny,E.Seidel, J.Shalf,W.Smith,andP.Walker,Numerialrelativityina

dis-tributedenvironment,inProeedingsoftheNinthSIAMConfereneonParallelProessingforSientiComputing,1999.

[13℄ W. Benger, H.-C. Hege, A. Merzky,T. Radke, andE. Seidel,EientDistributed File I/O forVisualization in

GridEnvironments,Teh.ReportSC-99-43,ZuseInstituteBerlin,January2000.

[14℄ M.Beynon,R. Ferreira,T.Kur,andJ.Saltz,DataCutter:MiddlewareforFilteringVeryLargeSientiDatasets

onArhivalStorageSystems,inTheEighthGoddardConfereneonMassStorageSystemsandTehnologies/17thIEEE

SymposiumonMassStorageSystems,CollegePark,Maryland,USA,Marh2000.

[15℄ B.Cabral,N.Cam,andJ.Foran,Aelerated volumerenderingandtomographi reonstrutionusingtexturemapping

hardware,in1994SymposiumonVolumeVisualization,A.KaufmanandW.Krueger,eds.,1994,pp.9198.

[16℄ T. Cullip and U. Neumann, Aelerating volume reonstrution with 3D texture mapping hardware, Teh. Report

TR93-027,DepartmentofComputerSieneattheUniversityofNorthCarolina,ChapelHill,1993.

[17℄ I. T. Foster, C. Kesselman, G. Tsudik, and S. Tueke, Aseurity arhiteture foromputational grids, inACM

ConfereneonComputerandCommuniationsSeurity,1998,pp.8392.

[18℄ L. A. Freitag and R. M. Loy, Adaptive, multiresolution visualization of large data sets using a distributed memory

otree, inProeedingsof SC99: High Performane Networkingand Computing, Portland,OR,November 1999, ACM

PressandIEEEComputerSoietyPress.

[19℄ H.-C. Hege and A. Merzky, GriKSLImmersive Überwahung und Steuerung von Simulationen auf entfernten

Superomputern,DFN-Mitteilungen,59(2002),pp.57.

[20℄ F.IsailaandW.F.Tihy,Mappingfuntionsanddataredistributionforparallelles,inProeedingsofIPDPS2002

Work-shoponParallelandDistributedSientiandEngineeringComputingwithAppliations,April2002.FortLauderdale.

[21℄ L.Linsen, J.Gray,V. Pasui, M. A.Duhaineau,B.Hamann, andK. I. Joy, Hierarhial Large-saleVolume

Representationwith'3rd-root-of-2'SubdivisionandTrivariateB-splineWavelets,Mathematis+Visualization,Springer

Verlag,Heidelberg,Germany,2003.

[22℄ A.NortonandA.Rokwood,EnablingView-DependentProgressiveVolumeVisualizationontheGrid,IEEEComputer

GraphisGraphisAppliationsforGridComputing,(2003),pp.2231.

[23℄ C.Nuber,R.W.Brukshen,B.Hamann,andK.I.Joy,Interativevisualizationofverylargedatasetsusingan

out-of-orepoint-basedapproah,inProeedingsoftheHighPerformaneComputingSymposium2003(HPC2003),I.Baniesu,

ed.,SanDiego,California,Marh30April2,20032003,TheSoietyforComputerSimulationInternational.Orlando,FL.

[24℄ S.Olbrih, T. Weinkauf,A. Merzky,H.Knipp, H.-C.Hege, andH.Pralle,Lösungsans�zezur Visualisierung

imHighPerformaneComputingundNetworkingKontext,inZukunftderNetze-DieVerletzbarkeitmeistern.,J.von

Knopand W.Haverkamp,eds., vol.10, Düsseldorf, Germany, May 2002,pp.269279. 16.DFN-Arbeitstagung über

Kommunikationsnetze,GIEdition,LetureNotesinInformatis(LNI).

[25℄ N. Ramakrishnan and A. Y. Grama,Data Mining Appliations in Bioinformatis, inData Mining for Sienti and

EngineeringAppliations,KluwerAademiPublishers,2001,pp.125140.

[26℄ J. Rumble Jr., Publiation and Use of Large Data Sets, Seond Joint ICSU Press - UNESCO Expert Conferene on

EletroniPublishinginSiene,(2001).

http://users.ox.a.uk/ isuinfo/rumbleppr.htm.

[27℄ H.ShumannandW.Müller,Visualisierung,Springer,Berlin,Heidelberg,NewYork,2000.

[28℄ D.Stalling,M.Westerhoff,andH.Hege,Amiraanobjetorientedsystemforvisualdataanalysis,inVisualization

Handbook,C.R.JohnsonandC.D.Hansen,eds.,AademiPress,toappear2003. http://www.amiravis.om /.

[29℄ M. Weiler, R. Westermann, C. Hansen, K. Zimmerman, and T. Ertl., Level-of-detail volume rendering via 3D

textures,inIEEEVolumeVisualizationandGraphisSymposium2000,1994,pp.713.

[30℄ R. Wolski, N.Spring, andJ.Hayes,TheNetworkWeatherServie: ADistributedResoure Performane Foreasting

ServieforMetaomputing,JournalofFutureGenerationComputingSystems,15(1999),pp.757768.

Editedby: WilsonRivera,JaimeSeguel.

Reeived: July5,2003.

References

Related documents

Adding the differential monogamy terms in Model 3 reverses the predicted network exposure risk differentials for both sexes, producing a pattern that is consistent with the

• Tenancy sustainment services cannot solve or prevent all the problems that lead to homelessness, but can in some cases prevent the problems from reaching crisis point or

Information flow : using a remote cluster Fi rew al l Fi rew al l Cluster node Cluster frontend Client R Client Information Service RGridRPC Invoke Server Information Service

We argue that regions with better death counts coverage present better quality reports on causes of death and, in other regions, as mortality coverage improves, the percentage

This resource provides a list of brain injury rehab facilities, understanding brain injury, a rehab checklist, a lawyer checklist, advocacy, goal setting, problem solving,

If there is a demolition subsidy the investor has lower cost to prepare the property for a new building and the current owner will expect a higher price in case of sale..