• No results found

Parrot: Transparent User-Level Middleware for Data-Intensive Computing

N/A
N/A
Protected

Academic year: 2020

Share "Parrot: Transparent User-Level Middleware for Data-Intensive Computing"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

PARROT:AN APPLICATIONENVIRONMENTFOR DATA-INTENSIVE COMPUTING

DOUGLAS THAIN AND MIRON LIVNY

Abstrat. Distributed omputing ontinues to be an alphabet-soup of servies and protools for managing omputation

and storage. To liveinthisenvironment, appliations require middlewarethat an transparently adaptstandard interfaesto

newdistributed systems;suhmiddlewareisknownas an interposition agent. Inthispaper,wepresentseveral lessonslearned

aboutinterpositionagentsviaaprogressivestudyofdesignpossibilities. Althoughperformaneisanimportantonern,wepay

speialattention toless tangible issuessuh asportability,reliability,and ompatibility. We beginwithaomparisonof seven

methodsof interposition and selet onemethod, the debugger trap, thatis the slowest but also the most reliable. Usingthis

method,weimplementa omplete interposition agent, Parrot, that splies existingremote I/Osystemsintothe namespaeof

standardappliations. TheprimarydesignproblemofParrotisthemappingofxedappliationsemantisintothesemantisof

theavailableI/Osystems.Weoeradetaileddisussionofhowerrorsandotherunexpetedonditionsmustbearefullymanaged

inordertokeepthismappingintat. WeonludewithaevaluationoftheperformaneoftheI/OprotoolsemployedbyParrot,

anduseanAndrew-likebenhmarktodemonstratethatsemantidiereneshaveonsequenesinperformane. 1

Keywords. Adaptivemiddleware,errordiagnosis,interpositionagents,virtualmahines.

1. Introdution. Theeldofdistributedomputinghasproduedountlesssystemsforharnessingremote

proessorsand aessingremotedata. Despitetheintentionsoftheirdesigners,nosinglesystemhasahieved

universalaeptaneordeployment. Eaharriesitsownstrengthsandweaknessinperformane,manageability,

and reliability. Renewed interest in world-wideomputational systems is inreasing the number of protools

andinterfaesin play. Aomplexeologyofdistributed systemsishereto stay.

CPU/IO

Interaction

Common I/O Interface

Distributed I/O Services

FTP

Process

Distributed Computing Services

Condor

PBS

NQE

LSF

Leveler

Load

Application

Local Operating System

Common Process Interface

(open, close, read, write, lseek)

(main, exit, abort, kill, sleep)

NeST

Chirp

RFIO

Parrot

DCAP

Fig.1.1. TheHourglassModel

Theresult is anhourglass model of distributed omputing,

showninFigure1.1. Attheenterlieordinaryappliationsbuilt

to standard interfaes suh as POSIX. Above lie a number of

bath systemsthat manage proessors,interat withusers,and

dealwithfailuresofexeution. Abathsysteminteratswithan

appliationthroughsimpleinterfaessuhasmainandexit.

Be-lowlieanumberofI/Oserviesthatorganizeandommuniate

with remote memory, disks, and tapes. An ordinary operating

system(OS)transformsanappliation'sexpliitreadsandwrites

into thelow-levelblokandnetwork operationsthat omposea

loalordistributed lesystem.

However,attahinganewI/OservietoatraditionalOSis

not a trivial task. Although the priniple of an extensible OS

hasreeivedmuhattentionintheresearhommunity[19℄,

pro-dution operating systems have limited failities for extension,

usuallyrequiringkernelmodiationsoradministratorprivileges.

Althoughthismaybeaeptableforapersonalomputer,this

re-quirementmakesitdiultorimpossibletoprovideustomI/O

andnamingserviesforappliationsvisitingaborrowed

omput-ingenvironmentsuh asatimesharedmainframe,aommodity

omputingluster,oranopportunistiworkgroup.

Toremedythissituation,weadvoatetheuseofinterposition

agents [13℄. These devies transform standard interfaes into

remoteI/Oprotoolsnotnormallyfoundinanoperatingsystem.

In eet, an agent allowsan appliation to bring its lesystem

andnamespaealongwithitwhereveritgoes. Thisreleasesthe

dependeneonthedetails oftheexeutionsitewhile preserving

theuseofstandardinterfaes. Inaddition,theagentantapintonamingserviesthattransformprivatenames

intofully-qualiednamesrelevantinthelargersystem.

ComputerSienesDepartment,UniversityofWisonsin

1

(2)

internaltehniques externaltehniques

poly. stati dyn. binary debug remote kernel

exten. link link rewrite trap lesys. allout

sope library stati dynami dynami nosetuid any any

burden rewrite relink identify identify runommand superuser modifyos

layer xed any any any sysall fsopsonly sysall

init/ni hard hard hard hard easy impossible easy

a. linker no no no no yes yes yes

debug yes yes yes yes limited yes yes

seure no no no no yes yes yes

ndholes easy hard hard hard easy easy easy

porting easy hard hard hard medium easy medium

Fig.1.2.PropertiesofInterpositionTehniques

Inthispaper,wepresentpratiallessonslearnedfromseveralyearsofbuildinganddeployinginterposition

agentswithintheCondorprojet.[20,28,21,22℄AlthoughthenotionofsuhagentsisnotuniquetoCondor[13,

2,12℄,theyhaveseenrelativelylittleuseinotherprodutionsystems. Thisisduetoavarietyoftehnialand

semantidiultiesthat ariseinonnetingrealsystemstogether.

Wepresentthispaperasaprogressivedesignstudythatexplorestheseproblemsandexplainsoursolutions.

We begin with adetailed study of seven methods of interposition, veof whih wehave experiene building

and deploying. The remaining two are eetive but impratial beause of the privilege required. We will

ompare the performane and funtionality of these methods, giving partiular attention to intangibles suh

asportability andreliability. Inpartiular, wewillonentrateon onemethod that has notbeen exploredin

detail: thedebuggertrap. Althoughthismethodhasbeenemployedinidealizedoperatingsystems,itrequires

additional tehniques in order to provideaeptable performane on popularoperating systemswith limited

debuggingapabilities, suhas Linux.

Usingthedebuggertrap, wefousonthedesignof Parrot,aninterpositionagentthatspliesremoteI/O

systemsintothelesystemspae ofordinaryappliations. Aentral problemin thedesignofanI/Oagentis

thesemantiproblemofmappingnot-quite-identialinterfaestoeahother. Theoutgoingmappingisusually

quitesimple: read beomesaget,writebeomesaput,andso forth. Therealdiulty liesin interpretingthe

large spae of return valuesfrom remote servies. Many new kinds of failure are introdued: servers rash,

redentials expire, and disks ll. Trivial transformations into the appliation's standard interfae lead to a

brittleandfrustratingexperienefortheuser.

Aorollarytothisobservationisthataesstoomputationandstorageannotbefullydivored. Abstrat

notionsof designoften enouragethe partition ofdistributed systemsinto twoativities: either omputation

orstorage. An interpositionagentservesasaonnetionbetweenthesetwoonerns;likeanoperatingsystem

kernel,itmanagesbothtypesofdeviesandmustmediatetheirinteration,sometimesbypassingtheappliation

itself.

Thispaperisaondensedversionofaworkshoppaper. Duetospaelimitations,wehaveomittedanumber

of setions and details,indiated byfootnotes. Theinterested readermay ndfurther details in the original

paper[23℄orinatehnialreport. [24℄ 2

2. Interposition Tehniques Compared. Thereare manytehniques forinterpositioningservies

be-tween an appliation and the underlying system. Eah has partiular strengths and weaknesses. Figure 1.2

summarizesseveninterpositiontehniques. Theymaybebrokenintotwobroadategories: internaland

exter-nal. Internaltehniquesmodifythememoryspaeofanappliationproessinsomefashion. These tehniques

areexibleandeient,butannotbeappliedtoarbitraryproesses. Externaltehniquesaptureandmodify

operationsthat are visible outsideanappliation's addressspae. These tehniques arelessexible andhave

higher overhead,but an beapplied to nearlyany proess. TheCondor projethas experiene building and

deployingalloftheinternaltehniquesaswelloneexternaltehnique: thedebuggertrap. The remainingtwo

externaltehniqueswedesribefromrelevantpubliations.

Thesimplest tehniqueisthepolymorphiextension. Iftheappliationstrutureisamenabletoextension,

wemaysimplyaddanewimplementationofanexistinginterfae. Theuserthenmustmakesmallodehanges

toinvoketheappropriateonstrutororfatoryin orderto produethenewobjet. This tehniqueisusedin

(3)

Condor'sJavaUniverse[22℄toonnetanordinaryInputStreamorOutputStreamtoaseureremoteproxy. It

isalsofoundingeneralpurposelibrariessuhasSFIO[25℄.

Thestati library tehniqueinvolvesreatingareplaementforanexisting library. Theuserisobligedto

re-linktheappliationwith thenewlibrary. Forexample,Condor's StandardUniverse[20℄providesadrop-in

replaement for the standard C library that provides transparent hekpointing as well as proxying of I/O

baktothesubmissionsite,fully emulatingtheuser'shomeenvironment. Thedynami librarytehniquealso

involvesreatingareplaementforanexistinglibrary. However,throughtheuseoflinkerontrols,theusermay

diret thenewlibraryto beused in plae ofthe oldforanygiven dynamiallylinked library. Thistehnique

is used by DCahe [8℄, some implementations of SOCKS [15℄, as well as our own Bypass [21℄ toolkit. The

binaryrewriting tehniqueinvolvesmodifyingthemahineodeofaproessat runtimetoredirettheowof

ontrol. This requiresverydetailed knowledge oftheCPU arhiteturein use,but thisanbehiddenbehind

anabstrationsuhastheParadyn[17℄toolkit. Thistehniquehasbeenused tohijakanunwittingproess

atruntime[28℄.

Traditional debuggersmake useof aspeialized operating systeminterfae for stopping, examining, and

resuming a proess. The debugger trap tehnique uses this interfae, but instead of merely examining the

proess, the debuggingagent traps eah systemall, provides an implementation, and then plaes the result

bakin thetargetproesswhile nullifyingtheintendedsystemall. An exampleofthistehniqueisUFO[2℄,

whih allowsaess to HTTPand ftpresouresviawhole-lefething. Adiultywith thedebugger trapis

thatmanytoolsompeteforaesstoasingleproess'debuginterfae. TheToolDaemonProtool(TDP)[18℄

providesaninterfaeformanagingsuhtoolsin adistributedsystem.

A remote lesystemmaybeused asan interposition agentbysimply modifying thele server. NFS is a

popular hoie for this tehnique, and is used bythe Legion[27℄ objet-spaetranslator,as well theSlie [4℄

miroproxy. Finally,shortofmodifyingthekernelitself,wemayinstallaone-timekernelalloutwhihpermits

a lesystem to be servied by a user-level proess. This faility an be present from the ground up in a

mirokernel [1℄, but an also be added as an afterthought, whih is the ase for most implementations of

AFS[11℄.

Thefourinternaltehniquesmayonlybeappliedtoertainkindsofprograms. Polymorphiextensionand

statilinking only applyto those programsthat anberebuilt. The dynamilibrarytehniquerequires that

thereplaedlibrarybedynami,whilebinaryrewriting(withtheParadyntoolkit)requiresthepreseneofthe

dynamiloader, although nopartiularlibrarymust be dynami. Thethreeexternal tehniques apply toany

proess,withtheexeptionthatthedebuggingtrappreventsthetraedproessfromelevatingitsprivilegelevel

throughthesetuidfeature.

Theburdenupontheuserforeahofthesetehniquesalsovarieswidely. Forexample,polymorphi

exten-sionrequiressmall odehangeswhilestatilinkingrequiresrebuilding. Thesetehniquesmaynotbepossible

with pakaged ommerial software. Dynami linking and binary rewriting require that the user understand

whih programs are dynamially linked and whih are not. Most standardsystem utilities are dynami, but

many ommerial pakages are stati. Our experiene is that users are surprised and quite frustrated when

an(unexpetedly)statiappliationblithely ignoresaninterposition agent. Theremotelesystemand kernel

allout tehniques impose the smallest userburden, but requirea ooperative systemadministrator to make

the neessary hanges. The debugger trap imposes a small burden on the user to simply invoke the agent

exeutable.

Perhaps the most signiant dierene between the tehniques is the ability to trap dierent layers of

software. Eah oftheinternal tehniquesmaybeappliedat anylayerofode. Forexample,Bypasshasbeen

usedtoinstrumentanappliation'sallstothestandardmemoryalloator,theXWindowSystemlibrary,and

theOpenGLlibrary. Inontrast,theexternaltehniques arexedtopartiularinterfaes. Thedebuggertrap

only operates on physial system alls, while the remote lesystem and kernel allout are limited to ertain

lesystemoperations.

Dierenes in these tehniques aet the design of ode that they attah to. Consider the matter of

implementingadiretorylistingonaremotedevie. Theinternaltehniquesareapableofintereptinglibrary

allssuh asopenand opendir. These areeasily mappedto remoteleaess protools,whihgenerallyhave

separateproeduresforaessinglesanddiretories. However,theUnix interfaeunieslesanddiretories;

both are aessed throughthe systemall open. External tehniques must aeptan open on either ale or

(4)

Theexternaltehniquesalsodierintherangeofoperationsthattheyareabletotrap. Whilethedebugger

trapanmodifyanysystemall,theremotelesystemand kernelallouttehniques arelimitedto lesystem

operations. A partiular remote lesystem may have even further restritions. For example, the stateless

NFS protool has no representation of the system alls open and lose. Without aess to this information,

the interposed servieannot provide semantis signiantlydierent than those provided by NFS. Further,

suh le system interfaes do not express any binding betweenindividual operations and the proesses that

initiate them. That is, a remote lesystem agent sees a read or write but not the proess id that issued it.

Withoutthis information, itisdiult orimpossibleto performingaountingfor thepurposesofseurity or

performane.

Anumberofimportantativitiestakeplaeduringtheinitializationandnalizationofaproess: dynami

librariesare loaded;onstrutors, destrutors,and otherautomatiroutinesare run;I/Ostreams are reated

or ushed. During these transitions, the libraries and other resoures in use by a proess are in a state of

ux. Thisompliatestheimplementationofinternalagentsthat wishto intereptsuhativity. Forexample,

theappliation may perform I/Oin aglobal onstrutorordestrutor. Thus, an internal agentitself annot

relyonglobal onstrutorsordestrutors: there is noordering enforedbetween thoseof theappliation and

thoseoftheagent. Likewise,adynamiallyloadedagentannotinterposeontheationsofthedynamilinker.

Theprogrammerofsuhagentsmustnotonlyexerisearein onstrutingtheagent,butalsoin seletingthe

librariesinvokedby the agent. Suh odeis time onsuming to reate anddebug. These ativitiesare muh

more easily manipulated throughexternal tehniques. Forexample, externaltehniques an easily trap and

modifytheativitiesofthedynamilinker.

Noodeiseveromplete norfullydebugged. Produtiondeploymentofinterposition agentsrequires that

usersbepermittedtodebug bothappliationsand agents. All tehniquesadmitdebuggingof userprograms,

with the only ompliation arising in the debugger trap. For obvious reasons, a single proess annot be

debuggedby twoproessesat one, soa debuggerannot beattahed to aninstrumentedproess. However,

adebuggertrapagentanbeused tomanageanentireproess tree,so insteadtheusermayusetheagentto

invokethedebugger,whih maythen invoketheappliation. Thedebugger's operationsmaybetrappedjust

likeanyothersystemall andpassedalongtotheappliation,allunderthesupervisionoftheagent.

Interposition agents may be used for seurity as well as onveniene. An agent may provide a sandbox

whihprevents anuntrustedappliation from modifying anyexternal datathat it is notpermitted toaess.

Theinternal tehniquesarenotsuitable forthis seuritypurpose, beausetheymayeasily be subverted bya

programthatinvokessystemallsdiretlywithoutpassingthroughlibraries. Theexternaltehniques,however,

annotbefooledin thiswayandarethussuitableforseurity.

Related to seurity is thematter of hole detetion. An interposition agentmay fail to trapan operation

attempted by an appliation. This may simply be a bug in the agent, or it may be that the interfae has

evolved over time, and the appliation is using adepreated or newly added interfae that the agent is not

aware of. Internal agents are espeially sensitive to this bug. As standard libraries develop, interfaes are

addedanddeleted,andmodiedlibraryroutinesmayinvokesystemallsdiretlywithoutpassingthroughthe

orresponding publiinterfaefuntion. Forexample,fopen mayinvoketheopen systemallwithoutpassing

throughtheopenfuntion. Suhaneventausesgeneralhaosinboththeappliationandagent,oftenresulting

inrashesor(worse)silentoutputerrors. Nosuh problemoursinexternalagents. Althoughinterfaesstill

hange, any unexpeted event is deteted as an unknown system all. The agent may then terminate the

appliationandindiate theexatproblem.

The problemofholedetetionmustnotbeunderestimated. Ourexperieneisthatanysigniant

operating system upgrade inludes hanges to the standard libraries, whih in turn requiremodiations to

internal trapping tehniques. Thus, internal agents are rarely forward ompatible. Further, identifying and

xingsuhholesistimeonsuming. Beausethemissedoperationitselfisunknown,onemustspendlonghours

with a debugger to see where the expeted ourse of the appliation diers from the atual behavior. One

disovered,anewentrypointmustbeaddedtotheagent. Thetreatmentissimplebutthediagnosisisdiult.

WehavelearnedthislessonthehardwaybyportingboththeCondorremotesystemalllibraryandtheBypass

toolkittoawidevarietyofUnix-likeplatforms.

Forthese reasons,wehavedesribedporting in Figure1.2 asfollows. Thepolymorphiextension andthe

remote lesystem are quite easy to build on a new system. The debugger trap and the kernel allout have

(5)

getpid stat open/lose read 8KB bandwidth

unmod .18

±

.03

µ

s 1.85

±

.09 3.18

±

.08 3.27

±

.19 282

±

13MB/s rewrite .21

±

.25

µ

s 1.82

±

.02 3.21

±

.05 3.26

±

.03 280

±

7MB/s stati .21

±

.02

µ

s 1.80

±

.17 3.59

±

.05 3.34

±

.02 280

±

17MB/s dynami 1.22

±

.01

µ

s 3.60

±

.10 5.53

±

.06 4.31

±

.09 278

±

4MB/s (

α

unmod) (6.8x) (1.9x) (1.7x) (1.3x) (0.99x)

debug 10.06

±

.21

µ

s 55.41

±

.50 42.09

±

.06 30.99

±

.26 122

±

4MB/s

(

α

unmod) (56x) (30x) (13x) (9x) (0.43x)

Fig.2.1. OverheadofInterpositionTehniques

andbinaryrewritingshouldbeviewedasasigniantportinghallengethatmustberevisitedateveryminor

operatingsystemupgrade.

Figure2.1omparestheperformaneoffourtransparentinterpositiontehniques. Weonstruteda

benh-markCprogramwhihtimed100,000iterationsofvarioussystemallsona1545MHzAthlonXP1800running

Linux 2.4.18. Available bandwidth wasmeasuredbyreading a100MB le sequentiallyin 1MB bloks. The

meanandstandarddeviationof1000ylesofeahbenhmarkareshown. Fileoperationswereperformedonan

existinglein atemporaryle system. Theunmodasegivestheperformaneofthisbenhmarkwithoutany

agent attahed, whilethe remainingveshowthesamebenhmark modied by eah interposition tehnique.

Ineahase,weonstrutedaveryminimalagenttotrapsystemalls andinvokethemwithoutmodiation.

Asanbeseen,thebinaryrewritingand statilinking methodsaddnosigniantosttotheappliation.

Thedynamimethodhasoverheadontheorderofmiroseonds,asitmustmanagethestrutureof(potentially)

multiple agents and invokea funtion pointer. However, these overheads are quikly dominated by the ost

ofmovingdata in andoutof theproess. Thedebuggertraphas thegreatestoverheadof allthetehniques,

rangingfroma56xslowdownforgetpidtoa6xslowdownforwriting8KB.Mostimportantly,thebandwidth

measurement demonstratesthat the debugger trapahieveslessthan half of the unmodiedI/O bandwidth.

Itshould be fairlynotedthat this latenyandbandwidthwillbedominatedbythelatenyand bandwidthof

aessingremoteserviesonommoditynetworks. Seurityandreliabilityomeatameasurableost. 3

Fig.3.1. InterativeBrowsingwithParrot

3. Parrot. TheParrotinterposition agentattahesstandardappliationsto avarietyofdistributed I/O

systemsby wayof thedebuggertrap,desribedabove. EahI/Oprotool ispresentedasanormallesystem

entryunder anewtop-leveldiretorybearingthenameoftheprotool. Inaddition,anoptionalmountlistmay

be given, whih redirets parts ofthe lesystemnamespae toexternal paths. Figure 3.1 showsParrotbeing

usedwithstandardtoolstomanipulatelesstoredattheMassStorageServer(MSS)attheNationalCenterfor

SuperomputingAppliations(NCSA)viatheGridSeurityInfrastruture(GSI)[9℄variantoftheFileTransfer

Protool(FTP).

Parrot is equipped with a variety of drivers for ommuniatingwith external storage systems; eah has

partiular features and limitations. The simplest is the Loal driver, whih simply passes operations on to

the underlying operating system. The Chirp protool was designed by the authors in an earlier work [22℄

(6)

to provide remote I/O with semantis very similar to POSIX. A standalone hirp server is distributed with

Parrot. The venerable File Transfer Protool (FTP) has been in heavy use sine the early days of the

Internet. Itssimpliityallowsfor awidevarietyof ofimplementations, whih, for ourpurposes, resultsin an

unfortunatedegreeofimpreisionwhihwewillexpanduponbelow. ParrotsupportstheseureGSI[3℄variant

offtp. TheNeSTprotoolisthenativelanguageoftheNeSTstorageappliane[6℄,whihprovidesanarrayof

authentiation,alloation,andaountingmehanismsforstoragethatmaybesharedamongmultipletransient

users. TheRFIOandDCAPprotoolsweredesignedinthehigh-energyphysisommunitytoprovideaess

tohierarhialmassstoragedeviessuh asCastor[5℄andDCahe[8℄.

Beause Parrot must preserve POSIX semantis for the sakeof the appliation, our foremost onern is

theabilityofeahofthese protools toprovidetheneessarysemantis. Performaneisaseondaryonern,

althoughit isaetedsigniantlybysemantiissues. A summaryofthesemantisof eahof theseprotools

isgiveninFigure 3.2. 4

namebinding disipline dirs metadata symlinks onnetions

posix open/lose random yes diret yes

-hirp open/lose random yes diret yes perlient

ftp get/put sequential varies indiret no perle

nest get/put random yes indiret yes perlient

ro open/lose random yes diret no perle/op

dap open/lose random no diret no perlient

Fig.3.2. ProtoolCompatibilitywithPOSIX

4. Errors and Boundary Conditions. Error handlinghasnotbeenapervasiveproblemin thedesign

oftraditionaloperatingsystems. Asnewmodelsofleinterationhavedeveloped,attendingerrormodeshave

beenaddedto existing systemsbyexpanding thesoftwareinterfaeat everylevel. Forexample, theaddition

ofdistributed lesystemsto theUnixkernelreatedthenewpossibility ofastale lehandle, representedby

the ESTALEerror. As this errormode wasdisoveredat the verylowest layersof the kernel, the valuewas

addedtothedeviedriverinterfae,thelesysteminterfae,thestandardlibrary,andexpetedtobehandled

diretlybyappliations.

We have no suh luxury in an interposition agent. Appliations use the existing interfae, and wehave

neither the desire nor the ability to hange it. Sometimes, if we are luky, we may re-use an error suh as

ESTALEforananalogous,ifnotidentialpurpose. Yet,theunderlyingdevie driversgenerateerrorsranging

fromthevaguelesystemerror tothemirosopiallypreiseserver'sertiationauthorityisnottrusted.

How should the unlimited spae of errors in the lower layers be transformed into the xed spae of errors

availableto theappliation? 5

Forexample,severaldeviedrivershavetheneessarymahinerytoarryoutallofauser'spossiblerequests,

butprovidevagueerrorswhenasupportedoperationfails. TheFTPdriverallowsanappliationtoreadale

viatheGETommand. However,iftheGET ommandfails, theonlyavailableinformation istheerrorode

550,whihenompassesalmostanysortof lesystemerrorinludingnosuhle, aessdenied, andisa

diretory. ThePOSIXinterfae doesnotpermita ath-all errorvalue; itrequires aspei reason. Whih

errorodeshouldbereturnedto theappliation?

Onetehniquefordealingwiththis problemisto interviewtheserviein orderto narrowdowntheause

oftheerror,in amannersimilarto thatofanexpertsystem. Supposethat weattemptto retrievealeusing

anFTPGEToperation. IftheGETshouldfail,wemayhypothesizethatthenamedleisatuallyadiretory.

The hypothesis may betested with ahange diretory(CWD) ommand. If that sueeds, thehypothesis is

true, and wemayreturn thepreise error notale. If that fails, we must propose anotherhypothesis and

testit. Parrotperformsanumberoftwo-and three-stepinterviewsin responsetoavarietyofFTPerrors.

TheonnetionstrutureofaremoteI/Oprotoolalsohasimpliationsforsemantisaswellasperformane.

Chirp,NeST,andDCAPrequireoneTCPonnetionbetweeneahlientand server. FTPandRFIOrequire

anewonnetion madeforeah leopened. Inaddition, RFIOrequires anewonnetionfor eah operation

performedonanon-openle. Beausemostle systemoperationsaremetadataqueries,this anresultin an

4

Omitted: DetailsofthevariousprotoolssupportedbyParrot.

(7)

extraordinarynumberofonnetionsinashortamountoftime. Ignoringthelatenypenaltiesofthisativity,a

largenumberofTCPonnetionsanonsumeresouresatlients,servers,andnetworkdeviessuhasaddress

translators. 6

5. Performane. We havedeferredadisussion of performane until this pointso that wemaysee the

performaneeetsofsemantionstraints. Althoughitispossibletowriteappliationsexpliitlytouseremote

I/Oprotoolsin themosteientmanner, Parrotmustprovideonservativeandompleteimplementationsof

POSIXoperations. Forexample,anappliationmayonlyneedtoknowthesizeofale,butifitrequeststhis

informationviastat,Parrotisobligedtollthestruturewitheverythingitan,possiblyatgreatost.

0

1

2

3

4

5

6

7

8

64M

16M

4M

1M

256K

64K

16K

4K

Bandwidth (MB/s)

Block Size

ftp

rfio

dcap

nest

chirp

Fig.5.1.Throughputof128MBFile Copy

TheI/Oserviesdisussedhere,withthe

exep-tionofChirp,aredesignedprimarilyforeient

high-volumedatamovement. Thisisdemonstratedby

Fig-ure5.1,whihomparesthethroughputofthe

proto-olsatvariousbloksizes. Thethroughputwas

mea-suredbyopyinga128MBleintotheremotestorage

deviewiththestandardpommandequippedwith

Parrotandavaryingdefaultbloksize,asontrolled

throughthestatemulationdesribedabove.

Of ourse, theabsolute values are an artifat of

oursystem,however,itanbeseenthatallofthe

pro-tools must betuned for optimalperformane. The

exeptionisChirp,whihonlyreahesaboutonehalf

of the available bandwidth. This is beause of the

stritRPCnaturerequiredforPOSIXsemantis;the

Chirp server does not extrat from the underlying

lesystem any more data than neessary to supply

theimmediate read. Although itis tehnially

feasi-blefortheservertoreadaheadinantiipationofthenextoperation,suhdatapulledintotheserver'saddress

spaemightbeinvalidatedbyotheratorsonthele inthemeantimeandisthussemantiallyinorret.

Thehiupin throughputofDCAP at abloksize of64KBis anunintended interationwiththedefault

TCPbuersize of64KB. ThedevelopersofDCAP are awareof theartifatandreommendhangingeither

theblok size orthebuer size to avoid it. This is reasonableadvie, given that all ofthe protoolsrequire

tuningofsomekind.

Figure5.2 benhmarksthelatenyofPOSIX-equivalent operationsineahI/Oprotool. These

measure-ments were obtained in a manner idential to that of Figure 2.1, with the indiated servers residing on the

samesystemasin Figure5.1. Notie thatthelateniesaremeasuredin milliseonds,whereasFigure 2.1gave

miroseonds.

proto stat open/lose read 8KB write8KB bandwidth

hirp .50

±

.14 ms .84

±

.09 2.80

±

.06 2.23

±

.04 4.1 MB/s ftp .87

±

.09 ms 2.82

±

.26 (norandomaess) 7.9 MB/s nest 2.51

±

.05 ms 2.53

±

.17 4.48

±

.14 7.41

±

.32 7.9 MB/s ro 13.41

±

.28 ms 23.11

±

1.29 3.32

±

.14 2.85

±

.18 7.3 MB/s dap 152.53

±

16.68 ms 159.09

±

16.68 3.01

±

0.62 3.14

±

.62 7.5 MB/s

Fig.5.2. PerformaneofI/OProtoolsOnaLoal-Area Network

Wehastentonotethat thisomparison,in aertainsense,isnotfair. These dataserversprovidevastly

dierentservies,sotheperformanedierenesdemonstratetheostof theservie,notthelevernessof the

implementation. For example,Chirp andFTP ahievelow lateniesbeausethey are lightweighttranslation

layersover an ordinaryle system. NeST has somewhat higher lateny beause it provides the abstration

of avirtual le system, usernamespae, aess ontrol lists, and a storagealloation system, allbuilt on an

existinglesystem. Theostisduetotheneessarymetadatalogthatreordsallsuhativitythatannotbe

storeddiretlyintheunderlyinglesystem. BothRFIOandDCAParedesignedtointeratwithmassstorage

(8)

dist. proto opy list san make delete

loal loal .15

±

.02se .09

±

.20 .08

±

.02 65.38

±

3.47 .86

±

.18se loal hirp 1.22

±

.03se .34

±

.02 .40

±

.01 81.02

±

1.46 .79

±

.01se lan hirp 6.16

±

.22se .57

±

.30 1.32

±

.03 144.00

±

1.35 1.26

±

.02se lan hirp 10.67

±

.90se .53

±

.07 4.72

±

.32 95.05

±

2.33 1.24

±

.03se lan ftp 34.88

±

1.72se 1.47

±

.02 17.78

±

1.14 122.54

±

3.14 2.95

±

.15se lan nest 52.35

±

4.18se 12.92

±

4.87 28.14

±

4.52 307.19

±

3.26 31.73

±

4.37se lan ro (overwhelmedbyrepeatedonnetions)

lan dap (doesnotsupportdiretorieswithoutnfs)

Fig.5.3. PerformaneoftheAndrew-LikeBenhmark

systems; singleoperationsmay result in gigabytes of ativitywithin adisk ahe, possibly moving lesto or

fromtape. Inthat ontext,lowlatenyisnotaonern.

That said, several things may be observed from this table. Although FTP has benetted from years of

optimizations,theostofastatisgreaterthanthatofChirpbeauseoftheneedformultipleroundtripstoll

in theneessarydetails. Theadditionallatenyof open/lose is dueto themultiple round trips toname and

establishanewTCPonnetion. BothRFIOandDCAPhavehigherlateniesforsinglebytereadsandwrites

thanfor8KBreadsandwrites. Thisisduetobueringwhihdelayssmalloperationsinantiipationoffurther

data. Mostimportantly,alloftheseremoteoperationsexeedthelatenyofthedebuggertrapitselfbyseveral

ordersof magnitude. Thus, weareomfortablewith thepreviousdeisiontosarieperformanein favorof

reliabilityin theinterpositiontehnique.

WeonludewithamarobenhmarksimilartotheAndrewbenhmark.[11℄ThisAndrew-likebenhmark

onsistsofaseriesofoperationsontheParrotsouretree,whihonsistsof13diretoriesand296lestotaling

955KB.Toprepare,thesouretreeismovedtotheremotedevie. Intheopystage,thetreeisdupliatedon

theremotedevie. Intheliststage,adetailedlist(ls-lR)ofthetreeismade. Inthesanstage,alllesinthe

treeare searhed(grep) foratextstring. Inthemake stage,the software isbuilt. From anI/Operspetive,

thisinvolvesasequentialreadofeverysourele,asequentialwriteofeveryobjetle,andaseriesofrandom

readsandwritesto reatetheexeutables. Inthedeletestage,thetreeisdeleted.

Figure 5.3omparestheperformaneof theAndrew-likebenhmark inavarietyof ongurations. Inthe

three ases abovethe horizontal rule, wemeasure the ost of eah layerof software added: rst with Parrot

only,then withaChirp serveron thesamehost,thenwith aChirpserverarosstheloal areanetwork. Not

surprisingly, the I/Oost of separatingomputation from storageis high. Copying datais muh slowerover

thenetwork,although theslowdownin themakestage isquiteaeptableifweintend toinreasethroughput

viaremoteparallelization.

Inthetwoasesadjaenttotherule,theonlyhangeistheenablingofahing. Asmightbeexpeted,the

ostof unneessarydupliationauses aninreasein opyingthesouretree, although thediereneis easily

madeupinthemakestage,wheretheaheeliminatesthemultiplerandomI/Oneessarytolinkexeutables.

Thelistanddeletestagesonlyinvolvediretorystrutureandmetadataaessandarethusnotaetedbythe

ahe.

Intheveasesbelowthehorizontal rule,weexplorethe useofvarious protools to run thebenhmark.

In all of these ases, ahing is enabled in order to eliminate the ost of random aess as disussed. The

DCAP protool is semantially unable to run the benhmark, as it doesnot provide the neessaryaess to

diretories. TheRFIOprotoolissemantiallyabletorunthebenhmark,butthehighfrequenyoflesystem

operationsresultsinalargenumberofTCPonnetions,whihquiklyexhaustsnetworkingresouresatboth

thelientandtheserver,thuspreventingthebenhmarkfrom running. Chirp,FTP, andNeSTareallableto

omplete thebenhmark. TheNeST resultshaveahigh variane,due to delays inurredwhile themetadata

log is periodially ompressed. The dierene in performane between Chirp, FTP, and NeST is primarily

attributable tothe ostof metadatalookups. All thestages makeheavyuseof stat; themultiple round trips

neessarytoimplementthisompletelyforFTPandNeSThaveastrikingumulativeeet.

6. Conlusions. Interposition agents provide a stable platform for bringing old appliations into new

environments. We have outlined the diulties that we have enountered as well as the solutions we have

(9)

in theuse of virtualmahines in distributed systems[26℄the needfor powerfulbut lowoverhead methods of

interpositiongrows. Theappropriateinterfaeforthistaskisstillanopenresearhtopi.

The notion of virtualizing or multiplexing an existing interfae is a ommon tehnique [14, 7℄, but the

plague of errorsandother boundaryonditionsseems to be suered silentlybypratitioners. Suh problems

arerarelypubliized,however,weareawareoftwoexellentexeptions. C.Metz[16℄desribeshowtheBerkeley

sokets interfaeissurprisinglyhardto multiplex. T.Garnkel[10℄ desribesthesubtlesemantiproblems of

sandboxinguntrustedappliations.

Formoreinformation: http://www.s.wis.edu/thain/ rese arh/ parro t

7. Aknowledgments. Wethank John Bentand SanderKlousfor their help deployingand debugging

Parrot. VitorZandywrotethemehanismforbinaryrewriting. AlainRoygavethoughtfulommentsonearly

draftsofthispaper.

REFERENCES

[1℄ M. Aetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, andM. Young,Mah: Anewkernel

foundation forUnixdevelopment,inProeedingsoftheUSENIXSummerTehnialConferene,Atlanta,GA,1986.

[2℄ A. Alexandrov, M. Ibel, K. Shauser, and C. Sheiman, UFO: A personal global le system based on user-level

extensionstotheoperatingsystem,ACMTransationsonComputerSystems,(1998),pp.207233.

[3℄ W.Allok, A. Chervenak, I. Foster,C. Kesselman, and S.Tueke,Protools andserviesfor distributed

data-intensivesiene,inProeedingsofAdvanedComputingandAnalysisTehniquesinPhysisResearh,2000,pp.161163.

[4℄ D.Anderson,J. Chase,andA.Vahdat,Interposed requestroutingforsalablenetworkstorage,inProeedingsofthe

FourthSymposiumonOperatingSystemsDesignandImplementation,2000.

[5℄ O.Barring, J. Baud, andJ.Durand,CASTOR projet status,inProeedingsofComputing inHighEnergy Physis,

Padua,Italy,2000.

[6℄ J. Bent, V. Venkataramani, N. LeRoy, A. Roy, J. Stanley, A. Arpai-Dusseau, R. Arpai-Dusseau, and

M. Livny, Flexibility, manageability, and performane in a grid storage appliane, inProeedings of the Eleventh

IEEESymposiumonHighPerformaneDistributedComputing,Edinburgh,Sotland,July2002.

[7℄ D.Cheriton,UIO: AuniformI/O systeminterfaefordistributedsystems, ACMTransationsonComputerSystems,5

(1987),pp.1246.

[8℄ M.Ernst,P.Fuhrmann,M.Gasthuber,T.Mkrthyan,andC.Waldman,dCahe,adistributedstoragedataahing

system,inProeedingsofComputinginHighEnergyPhysis,Beijing,China,2001.

[9℄ I.Foster,C.Kesselman,G.Tsudik,andS.Tueke,Aseurityarhitetureforomputationalgrids,inProeedingsof

the5thACMConfereneonComputerandCommuniationsSeurityConferene,1998,pp.8392.

[10℄ T.Garfinkel,Trapsand pitfalls: Pratialproblemsininsystemall interpositionbasedseuritytools,inProeedingsof

theNetworkandDistributedSystemsSeuritySymposium,February2003.

[11℄ J.Howard, M. Kazar, S. Menees, D. Nihols, M. Satyanarayanan, R. Sidebotham, and M. West, Saleand

performaneinadistributedlesystem,ACMTransationsonComputerSystems,6(1988),pp.5181.

[12℄ G.HuntandD.Brubaher,Detours: Binaryintereptionof Win32funtions,Teh.ReportMSR-TR-98-33,Mirosoft

Researh,February1999.

[13℄ M.Jones,Interpositionagents:Transparentlyinterposinguserodeatthesysteminterfae,inProeedingsofthe14thACM

SymposiumonOperatingSystemsPriniples,1993.

[14℄ S.Kleiman,Vnodes: Anarhitetureformultiple lesystemtypesinSun Unix,inProeedingsoftheUSENIXTehnial

Conferene,1986,pp.151163.

[15℄ M.Leeh, M.Ganis,Y.Lee,R. Kuris, D.Koblas, andL.Jones,SOCKS protoolversion5. InternetEngineering

TaskFore,RequestforComments1928,Marh1996.

[16℄ C.Metz,ProtoolindependeneusingthesoketsAPI,inProedingsoftheUSENIXTehnialConferene,June2002.

[17℄ B.Miller,M.Callaghan,J.Cargille,J.Hollingsworth,R.B.Irvin,K.Karavani,K.Kunhithapadam,and

T. Newhall,TheParadyn parallelperformanemeasurementtools,IEEEComputer,28(1995),pp.3746.

[18℄ B.Miller,A.Cortes,M.A.Senar,andM.Livny,Thetooldaemonprotool(TDP),inProeedingsofSuperomputing,

Phoenix,AZ,November2003.

[19℄ C.SmallandM.Seltzer,AomparisonofOSextensiontehnologies,inProeedingsoftheUSENIXTehnialConferene,

1996,pp.4154.

[20℄ M.SolomonandM.Litzkow,SupportinghekpointingandproessmigrationoutsidetheUnixkernel,inProeedingsof

theUSENIXWinterTehnialConferene,1992.

[21℄ D.ThainandM.Livny,Multiplebypass: Interpositionagentsfordistributedomputing,JournalofClusterComputing,4

(2001),pp.3947.

[22℄ ,Errorsope ona omputational grid, inProeedingsofthe EleventhIEEE Symposiumon High Performane

Dis-tributedComputing,July2002.

[23℄ ,Parrot: Transparentuser-levelmiddlewarefordata-intensiveomputing,inProeedingsoftheWorkshoponAdaptive

GridMiddleware,September2003.

[24℄ , Parrot: Transparent user-level middleware for data-intensive omputing, Teh. Report 1493, Computer Sienes

Department,UniversityofWisonsin,Deember2003.

(10)

[26℄ A.Whitaker,M.Shaw,andS.D.Gribble,SaleandperformaneintheDenaliisolationkernel,inProeedingsofthe

FifthSymposiumonOperatingSystemDesignandImplementation,Boston,MA,Deember2002.

[27℄ B.White,A.Grimshaw,andA.Nguyen-Tuong,Grid-BasedFileAess: TheLegionI/OModel,inProeedingsofthe

NinthIEEESymposiumonHighPerformaneDistributedComputing,August2000.

[28℄ V.Zandy,B.Miller,andM.Livny,Proess hijaking,inProeedingsoftheEighthIEEEInternational Symposiumon

HighPerformaneDistributedComputing,1999.

Editedby: WilsonRivera,JaimeSeguel.

Reeived: July14,2003.

References

Related documents

Traditional project life-cycle changes product development project closure plan project goals deviations project results project initiation tracking actual plan planning

The present study was based on the effectiveness of reading aloud strategies for inferential reading comprehension skills and text difficulties of Saudi students

The CLA-VAL SERIES 136E/D Solenoid Control Valve is an On/Off Control Valve which either opens or closes upon receiving an electrical signal to the solenoid pilot control..

Traditionalist Salafi bypassing of centuries of Muslim legal scholarship leads Late Sunni Traditionalists to level accusations of arrogance against Traditionalist Salafis... In

All Polling Places subject to change for reasons beyond control of the Election Board... All Polling Places subject to change for reasons beyond control of the

operations, especially for high level military air operations, points up the necessity for the immediate implementation of the required NATO upper air network,c. A study of

ZF 3 and 4-speed automatic transmissions must be filled with approved ATF oils according to ZF list of lubricants TE-ML 11, lubricant class 11A / 11B  or with mineral ATF oils

Figure 4: Horizontal illuminance sensors (left to right): close-up, modified sensor layout for daylight autonomy study, sensors in testing chamber 2 (only one row considered for