DATA MANAGEMENT IN DISTRIBUTED SYSTEMS:A SCALABILITY TAXONOMY
A VIJAY SRINIVAS AND D JANAKIRAM
∗
Abstrat.
Data management isa key aspet of anydistributed system. Thispaper surveys data management tehniques in various
distributedsystems,startingfromDistributedSharedMemory(DSM)systemstoPeer-to-Peer(P2P)systems.Theentralfousis
onsalability,animportantnon-funtionalpropertyofdistributedsystems.Asalabilitytaxonomyofdatamanagementtehniques
ispresented. Detaileddisussionofthe evolutionofdata managementtehniques inthe dierentategoriesas wellasthe state
oftheartisprovided. Asaresult,severalopenissuesareinferredinludinguseofP2Ptehniquesindatagridsanddistributed
mobilesystemsandtheuseofoptimaldataplaementheuristisfromContentDistributionNetworks(CDNs)forP2Pgrids.
1. Introdution. Data management is an important faet of distributed systems. Data management
enompasses the ability to desribe data, handle multiple opies (repliation or ahing) of data objets or
les,supportformeta-dataaswellasdataqueryingandaessing. Dierentapproahesfordatamanagement
havegivenimportanetothesedierentaspetsandprovideexpliitsupport,whileotheraspetsareimpliitly
orindiretly supported. Forinstane, Distributed Shared Memory (DSM) systemsand shared objet spaes
handled onsistenyofrepliateddata,but supportedmeta-dataindiretlythroughobjetlookups.
Orthogonaltotheabovementionedissuesofmanagingdata,themainnon-funtionalhallengesare
fault-tolerane, salabilityand seurity, asillustratedin [32℄. Wesurveyvarious distributed systemsfrom the
per-spetive of salability of data management solutions and provide a salability taxonomy. We lassify data
managementapproahesintothreeategories:Centralized/NaivelyDistributed(CND)tehniques,
Sophistiat-ed/IntermediateData(SID)managementtehniquesandLargeSaleData(LSD)managementtehniques. We
giveabriefviewoftheevolutionofdatamanagementin eahoftheategories.
CNDtehniquesfordatamanagementwereusedbyDSMsystemssuhasTreadMarks[10℄,Munin[25℄and
sharedobjetspaessuhasLinda[24℄, Ora[36℄andTSpaes[4℄. Manyofthesesystemsprovideappliation
transparentrepliaonsistenymanagement. Theyuseentralizedornaivelydistributedomponentstoahieve
thesame. Forinstane,TSpaesusesaentralizedserverforonsistenymaintenaneandforobjetlookups,
whileJavaSpaes [81℄usesaentralizedtransationoordinator.
SIDtehniqueshavebeenusedmainly indatamanagementingridomputingsystemssuhas[51℄,whih
providesaRepliaManagementServie(RMS).Someofthesesystemsareharaterizedbydatasharingaross
autonomousorganizationsatintermediatesale(possiblythousandsofnodes). Theseapproahesmainlymanage
repliateddatainagridomputingenvironment. Datagrids[27℄handledatamanagementasrstlassentities
in addition to omputation issues. They areharaterizedby thesize ofthe data sets, whih ould be order
ofgigabytesoreventerabytes. HighEnergyPhysis(HEP)appliationssuhasGriPhyN[31℄andCERN [79℄
areexamplesofdatagrids. Otherapproahesthat useSID tehniques inlude ContentDistributionNetworks
(CDNs)and datamanagementin distributedmobile systems. CDNs suh asAkamai[43℄ havebeenproposed
to deliverwebontenttousersfrom loserto theedgeoftheInternet,enablingwebserversto saleup. Data
managementin distributed mobilesystemsareharaterizedby datasharingin thepresene ofmobile nodes,
exemplied by systemssuh asCoda[74℄. The ommonfeature arossthese dierentsystems is thesale of
operation(thousandsofnodes)thatdistinguishesSIDtehniquesfordatamanagement. Manyofthesesystems
assumethatfailuresarerareandreliableservers(distributed, notentralized)areavailable.
LSDmanagementtehniquesdonotassumereliableservers. ThedistinguishingfeatureofLSDtehniques
is that the exeution of servies is delegated to the edges of the Internet, resulting in high salability and
fault-tolerane. LSDtehniquesworkwellovertheInternet and ouldhandle millionsof nodes/dataentities.
Peer-to-PeerlesharingsystemssuhasNapster[57℄andGnutella[33℄,P2Plestoragemanagementsystems
suhasPAST[15℄andOeanstore[49℄aswellasP2PextensionstoDistributedDataBaseManagementSystems
(DDBMS)suhasPIER[38℄andPeerDB[60℄allfallintotheLSDategory.
Ataxonomyofdatagridshasbeenprovidedin[87℄. Itomparesdatagridswithrelateddatamanagement
approahes suh as CDNs, DDBMS and P2P systems. A funtional perspetive of data management that
fouses ondata loation, integration, sharing andquery proessing aswell asthe dierent P2Psystems that
∗
Distributed & Objet SystemsLab, Dept. of ComputerSiene & Engg., Indian Institute of Tehnology, Madras, India,
http://dos.iitm.a.in,{a vs, djrams.iitm.ernet.in}
addressthese funtionalitiesis given in [50℄. A surveyofP2P ontent distribution hasbeenprovidedin [77℄.
ItexaminesP2Parhiteturesfromtheperspetiveofnon-funtionalpropertiessuhasperformane,seurity,
fairness,fault-toleraneandsalability. Oursurveyisbroaderandtriestoprovidetheequivalentsurveyforgrids,
P2Psystems,CDNs and DDBMS. Wealsoprovide asalabilitytaxonomythat distinguishes oursurveyfrom
others. Further,wedisussstateoftheartinseveraloftheseareasanddisusshowideas/onepts/tehniques
fromoneareaanbeapplied toothers. Thereadermustkeepin mindthat thoughtheauthors havemadean
eortto beunbiased,thesurveyhaslimitationsasitispereivedthroughtheirlookingglass.
Therestofthepaperisorganizedasfollows. Setion2disussestheCNDtehniquesfordatamanagement
andinludesDSMsandsharedobjetspaes. Setion3disussestheSIDtehniquesandinludesdata
manage-mentin grids, CDNs, and distributed mobilesystems. Setion 4disussesP2P datamanagement tehniques.
Setion5exploresthestateoftheartdatamanagementtehniquesindistributedsystems. Setion6onludes
thepaperandinludesataxonomygureandgivesdiretionsforfuture researh.
2. CND Tehniques: Data Repliation in DSMs and Shared Objet Spaes. DSM providesan
illusionofgloballysharedmemory,inwhihproessorsansharedata,withouttheappliationdeveloperneeding
tospeifyexpliitlywheredataisstoredandhowitshouldbeaessed. DSMabstrationispartiularlyuseful
for parallel omputingappliations, asdemonstrated byTreadMarks [10℄. Collaborativeappliations suh as
on-linehattingand ollaborativebrowsingwouldbeeasiertodevelopoveraDSM.
PagebasedDSMs anbemoreeient,dueto theavailabilityofhardwaresupportfordetetingmemory
aesses. Butduetothelargergranularityofsharing,pagebasedDSMsmaysuerfromfalsesharing. Relaxed
onsistenymodelsinludingReleaseConsisteny(RC)anditsvariantssuhaslazyRCallowfalsesharingtobe
hiddenmoreeientlythanstritonsistenymodels[64℄. Munin[25℄wasanearlyDSMsystemwhihfoused
onreduingthe ommuniationrequiredfor onsistenymaintenane. It providessoftwareimplementation of
RC.TreadMarks[10℄isanotherDSMsystemthatprovidesanimplementationofreleaseonsisteny. Java/DSM
[91℄providesaJavaVirtualMahine(JVM)abstrationoverTreadMarks.ItisanexampleofpagebasedDSMs,
similartoMuninandTreadMarks.
ReleaseonsistenyisawidelyknownrelaxedonsistenymodelforDSMs. Memory aessesaredivided
intosynhronization(syn)andnon-synhronization(nsyn)operations. Thensynoperationsareeither data
operations or speial operations not used for synhronization. The syn operations are further divided into
aquire and release operations. An aquire is like a read operation to gain aess to a shared loation. A
releaseisthe omplementary operationperformedto allowaessto thesharedloation. Aquireandrelease
operationsanbethoughtofasonventionaloperationsonloks. TherearetwovariationsofRC,
RC
sc
whih realizessequentialonsistenyandRC
pc
whihrealizesproessoronsisteny.RC
sc
maintainsprogramorderfromanaquiretoanyoperationthatfollowsit,fromanoperationtoareleaseandbetweenspeialoperations.
RC
pc
issimilar, exept that writeto read programorder isnot maintainedforspeial operations. EagerRC,astheoriginalRCbeamesubsequentlyknown[48℄,requires ordinarysharedmemoryaess to beperformed
onlywhen asubsequentreleaseoperationisdue bythesameproessor. LazyRC(LRC)is avariationofRC
in whih proessorsfurtherdelayperformingmodiationsuntil subsequentaquires by other proessorsand
modiationsaremadeonlybytheaquiringproessor. LRCintuitivelyassumesompetingsharedaessesto
beseparatedbysynhronizationoperations.
2.1. SharedObjetSpaes. ObjetbasedDSMs(alsoknownassharedobjetspaes)alleviatethefalse
sharingproblembylettingappliationsspeifygranularityofsharing. ExamplesofobjetbasedDSMsinlude
Linda[24℄, Ora[36℄,TSpaes [4℄, JavaSpaes[81℄aswell asanobjetbasedDSM inthe.NET environment
[75℄. Orarelies on an update mehanismbased on totally ordered group ommuniationto serialize aess
to replias. Eventhoughastudy hasshownthat theoverheadof totallyordered groupommuniationaets
appliationperformaneminimally[37℄ 1
,thestudywasdoneonaMyrinetluster. Orahasnotbeenevaluated
on the Internet sale. T spaes is a shared objet spae from IBM [4℄ that adds database funtionality to
Lindatuplespae [24℄ andis implementedin Javato takeadvantageof its widerusability. In additionto the
traditionalLinda primitivesofin, out, read, Tspaessupports set orientedoperators andanovelrendezvous
operator alled rhonda. Global sharedobjets[90℄ allowsheap objetsin a JVMto be sharedaross nodes.
Basedonmemoryaesspatternsofappliations,italsoproposesvariousonsistenymehanismstoberealized
eiently. However, it uses loks and per-objetlok managers for keeping replias onsistent. It does not
addressfailuresofthelokmanager. JavaSpaes speiationfrom Sun[81℄ providesadistributed persistent
1
sharedobjetspaeusingJavaRMIandJavaserialization. ItprovidesLinda-likeoperationsonthetuplespae
and usesJini's transationspeiationto ahieveserializabilityof write operations. It alsodoesnotaddress
faulttolerane,animportantissueforInternetsalesystems.
2.1.1. Globe. Globe [3℄ attempted to address the hallenges of building software infrastruture for
de-velopingappliations over theInternet. A key designobjetiveof Globe wasto providea uniform model for
distributed omputing. This means that Globe providesa uniform way to aess ommon servies (suh as
naming,repliationandommuniation)withoutsariingdistributiontranspareny. ObjetsinGlobe
enap-sulatepoliiesforrepliation,migration,et. Eahobjetomprisesmultiplesub-objets,allowinganobjetto
bephysiallydistributed. Thedierentsub-objetsofanobjetinludeoneeahforsemantis(funtionality),
ommuniation(sending/reeivingmessages),repliationandontrolow. Thishelpstheprogrammerto
sepa-ratefuntionalityfromorthogonalnon-funtional propertiessuhasrepliation. Objetsalsohelpin realizing
distributiontransparenybyhidingimplementationdetailsbehindwelldenedinterfaes. Theimplementation
frameworkof Globeis exible, meaningthat dierentimplementationsof thesameinterfaesare possible. It
also provides an eient mehanism for objet lookups by using a tree based hierarhialnaming spae. It
mustbeobservedthatdistributed objetmiddlewaresuhasCORBA[61℄alsoprovidesimilarserviessuhas
namingandtrading. Buttheyannotprovideobjet-speipoliiesthatanbeprovidedin Globe.
2.2. Software Availability and Usage Summary. To the knowledge of the authors, T spaes and
JavaSpaes arewidely used andare available asopen souresoftware. Lindais aspeiation andhasbeen
implementedbyseveralgroups. OraandGlobeareresearhprototypes,informationontheirdeploymentand
useisnotavailable.
2.3. Observations. Wehaveproposedagenerisalabilitymodelforanalyzingdistributedsystemsin[6℄.
It takestheviewthat salabilityof distributed systemsshould beanalyzed onsideringrelated issuessuh as
onsisteny,synhronization,andavailability. Wegivebelowtheesseneofthemodel.
scalability
=
f
(
avail, sync, consis, workload, f aultload
)
•
availisavailabilityanbequantiedastheratioofthenumberoftransationsaeptedversusthosesubmitted.
•
onsisisonsisteny,itselfafuntionofupdateorderingandonsistenygranularity. Updateorderingrefersto the update orderingmehanismsarossreplias of anobjetand anbeoneof ausal,
seri-alizable or PRAM. Consisteny granularity refersto the grain size at whih onsisteny needs to be
maintained.
•
synreferstosynhronizationamongthereplias. Thetwodimensionsofsynhronizationarehowoftentherepliasaresynhronizedandthemodeofsynhronization(push/pull).
•
workloadanbebrokendownintoworkloadintensity(numberoftransationsperseondornumberoflients)andworkloadserviedemandharaterization(CPUtimeforoperations).
•
faultloadrefersto thefailuresequenesandthenumberaswellasloationofthereplias.Thesalabilitymodelgivenaboveisusefultoidentifybottleneksindistributed systems. Byapplyingthe
salabilitymodel onshared objet spaes, we haveidentied the key bottleneks that inhibit existing shared
objetspaes(withtheexeptionofGlobe)fromsalinguptotheInternet:
•
CentralizedComponentsMany existing DSMs and shared objet spaes have some entralized omponents that aet their
salability. Forinstane,Orahasasequenerforrealizingtotallyorderedgroupommuniation,while
otherslikeTSpaes[4℄haveaentralizedomponentforobjetlookups.
•
FailuresExistingsharedobjetspaesdonothandlefailures. Forinstane,JavaSpaesandglobalsharedobjets
donothandlefailuresof transationoordinator,whileOradoesnothandlefailureofthesequener.
•
ObjetLookupGivenanobjetidentier(id),eientmehanismsmustexistthatmapstheidtothenodethateither
storesareplia orstoresmeta-dataaboutthereplia. ExistingsharedobjetspaessuhasTSpaes
useentralizedlookupmehanisms. Objetlookupmehanismsin distributedobjetmiddlewaresuh
asCORBAand DCOMalsohavediultyinhandlingfailuresandsalingup.
•
ConsistenySeveralexistingDSMsystemssuhasTreadMarks,MuninandsharedobjetspaessuhasJavaSpaes
thesemehanismshavenotbeenevaluatedinInternetsalesystems. Peer-to-Peer(P2P)systemswhih
havebeensaledtotheInternet,suhasPastry[69℄andTapestry[17℄assumerepliasareread-only.
3. SID Tehniques for Data Management.
3.1. Computing Grids. Globus [39℄ a de-fato standard toolkit for grid omputing systems, relies on
expliitdatatransfersbetweenlientsandomputingservers. ItusestheGridFTPprotool[19℄that provides
authentiationbasedeientdatatransfermehanismforlargegrids. Globusalsoallowsdataatalogues,but
leavesatalogueonsistenyto theappliation. Thepaper[51℄ explores theinterfaesrequiredfor aReplia
ManagementServie(RMS) thatatsasaommonentrypointforrepliaatalogueservie,meta-dataaess
aswellaswideareaopy. Itdoesnotaddressonsistenyissuesperse. Further,theRMSisentralizedandmay
notsaleup. Theothergridpaperthathasaddresseddatamanagementissues[29℄outlinespossibleuse-ases
andgiveshigherlevelviewofthedatamanagementrequirementsinagrid. Thequorumshemeitdesribesfor
handlingread-writemayhavetobemodiedinanInternetkindofanenvironmenttohandlequorumdynamis.
Further,itdoesnotaddressvariousgranularitiesofrepliationandusesloksforsynhronization. Thepaper[78℄
alsoaddressesread-writedataonsistenyinagridenvironmentbasedonalazyupdatepropagationalgorithm.
Theupdate propagationalgorithmis basedontimestampsand may notsaleupto work inalargesalegrid
environment(Updateonits arehandledmanuallybyappliationprogrammer- non-trivialtask). Attempts
havealso beenmade to extendthe existing 2Phase Commit (2PC) basedalgorithms [82℄. These would need
globalagreementandmaybeexpensiveinanInternetsetting.
3.2. Data Grids. Ageneriarhitetureforhandlinglargedatasetsingridomputingenvironmentshas
beenproposed in [27℄. Itdesribestheway datagridservies suh asrepliationand repliaseletionanbe
builtoverbasiserviesofdataandmeta-dataaess. Itassumesthatreplias(leinstanes)areread-only.
GriPhyN[31℄attemptstosupportlarge-saledatamanagementinHighEnergyPhysis(HEP)appliations
aswellas forastronomy andgravitationalwavephysis. GriPhyN provides userstransparentaess to both
rawandproesseddata(Thetermvirtualdata isusedtorefertoboth). Itanonvertrawdatato proessed
data by sheduling required omputations and data transfers. GriPhyN is built on top of Globus. It takes
appliation meta-dataand mapsit intoaDireted AyliGraph(DAG),whih isanabstrat representation
oftherequiredationsondatasets. ArequestplannertakestheDAGandtransformsitintoaonreteDAG,
whihanbeexeutedbyagridshedulingsystemsuhasCondor-G[42℄.
CERN, theEuropean organizationfornulear researh,is alsoinvolvedin handlingomputation onlarge
data sets in the HEP area. Objet level aswell as le level repliation for data grids has been explored in
[79℄, a CERN eort. It also assumes les are read only and an be repliated without need for onsisteny
protools. Theysupport replia atalogsto handle meta-data. Atualle/objettransfersare ahievedusing
GridFTP[19℄.
Data relatedativitieson thegrid suh asqueuing, monitoringand shedulingneedto bearefully
man-aged,asdata ouldbeome bottlenekfordataintensiveappliations. Currently,these datarelated tasksare
performedmanuallyorbysimplesripts. Themain goalofStork [85℄wastomakedataarstlassitizenon
thegrid. Data plaement jobshavedierent harateristisfrom omputeintensivejobs and so,may haveto
betreateddierently. Stork is aseparate sheduler forshedulingandmanaging data intensive jobson grid.
DatarelatedativitiesarerepresentedintheformofaDAG.Storkaninteratwithhigherlevelplannerssuh
asDireted Ayli Graph Manager(DAGman) whih is apart ofCondorG. Enhanements havebeen made
to DAGman tomakeit submitomputeintensivejobsto gridshedulerssuhasCondorGand dataintensive
jobsto Stork. Storkalsosupportsdierentheterogeneousstoragesystemsandvariousdatatransferprotools.
CasestudieshavedemonstratedtheuseofStork asapipelinebetweentwoheterogeneousstoragesystemsand
forruntimeadaptationof datatransfers.
3.3. ContentDistribution Networks. Webservershaddiultyin handlingtheashrowdproblem.
The ash rowd problem refersto a largenumber of requests omingin suddenly, overwhelming theserver's
bandwidth,orCPUorbak-endtransationinfrastruture. Webservershaveburstyrequestnature,forinstane
during a football math in World Cup or during an eletion ounting proess, resulting in the ash rowd
problem. Content Distribution Networks (CDNs) suh as Akamai [43℄ have been proposed to handle this
problemandtoenablewebserverstosaleup. Aseparateinfrastrutureofdediatedserversspreadarossthe
Internet wasbuilt byseveral ompaniesto ooadontentdistribution from webserversorto deliverontent
fromtheedgeoftheInternet. Akamai'sCDNonsistsofovertwelvethousandserversarossthousanddierent
Studies have shown that ahing is beneial in CDNs as they mainly deliver images or videos (stati
ontent)[44℄. AkamaiCDNsahievedahehitratesofnearly
88%
inanotherstudythatomparedtheCDNswith P2P le sharingsystemsfor distributing ontent [76℄. This shows that CDNs are beneial for ontent
deliveryandanredueresponsetimeforlients. However,anotherstudyhasshownthattheaverageresponse
timeforlientsisnotaetedbyemployingCDNs[44℄. Buttheyavoidworstaseofbadlyperformingservers
ratherthanroutinglientrequeststo anoptimalCDNserver.
Caheonsistenybeomesahallengingissueinordertodelivernon-stationtenttolients. Traditional
ahingmehanismssuhasleasing[22℄maynotbediretlyappliabletoCDNs. Originserverswouldhaveto
keeptrakofeahCDNproxythatahesanobjet(webdoument)fromtheserver. Itmustalsomanagethe
leaserelatedissuesforthatCDNproxy,inludingnotifyingtheCDNproxyonupdatestotheobjet. TheCDN
proxyhastorenewtheleasetoreeivefurthernotiations. MehanismsforCDNsmustbesalable,requiring
the CDN proxies to ooperatively maintain onsisteny. Cooperative leases hasbeen proposed asa salable
mehanismformaintainingaheonsistenyinCDNs. [12,11℄. Eahobjetisassigneda
∆
parameter,whihindiatesthetimeortherate
1
/
∆
atwhihanoriginservernotiesinterestedCDN proxiesofupdatesto thatobjet. ThisallowsonsistenytoberelaxedimplyingthatCDNproxyanbenotiedonlyoneevery
∆
timeunits,insteadofafter everyupdate. Leasesareooperative,meaning thataCDNproxyatsasaleaderfora
CDNproxygroupforleaserelatedinterationswithanoriginserver. Theleaderisresponsiblefornotifyingthe
otherCDN proxies. Thisredues boththestatemaintainedattheoriginserverandthenumberof updatesit
mustsend.
3.4. Data Management in Distributed Mobile Systems. Distributed Mobile Systems (DMS) are
distributed systemsin whih some nodes maybe mobile and may haveonstraints. These onstraints ould
bebatteryormemoryor omputingpowerrelated. Dataouldeitherbestoredonorbeaessedfrom mobile
devies. Dierent kinds of management have been identied, with respet to the level of transpareny to
appliationsin [54℄. Clienttransparentadaptationallowsappliationstoseamlesslyaess datawithoutbeing
aware of mobility, with the system providing omplete support. The other extreme is a laisse-faire model
in whih adaptation is entirely at user level, with the system providing no support. There are a wealth of
strategiesbetweenthetwoextremes,thatallowappliationstobeawareofmobilityinvaryingdegreesinluding
appliationawareadaptationandextendedlientservermodels.
Coda[74℄wasoneoftheearlylesystemsthatallowslientstoseamlesslyaessinformation,anexampleof
lienttransparentadaptation. ThemaingoalofCodawastoenableoperationstobeperformedonashareddata
repository,evenin thefae ofdisonnetedoperations. Disonnetions maybefrequentinDMS. Venusisthe
ahemanageroneahlientthatmanagestheahe,hidingmobilityfromtheappliation.Venusahesvolume
mappings,with avolumereferringto asubtreeof theCodanamespae. Inthe fae ofonneted operations,
Codausesserverrepliationand allbakbasedaheohereneto ensuresession semantis(ontentswillbe
latest when a session is starting and after it ends) for appliations. During disonnetions, Venus relies on
aheontentsandpropagatesfailuretoappliationwhenaahemissours. Whendisonnetionends,Coda
revertsbaktoserverrepliationbyusingreintegrationoperationsusinglogs.
AppliationawareadaptionhasbeenusedintheOdysseysystem[21℄. Odysseyprovidesaleanseparation
between the onerns of the system and the appliation: system monitors resoure dynamis and noties
appliationsifrequired,butretainsontrolofresourealloationmehanism;whileappliationsspeifymapping
ofresourelevelstodelity levels. Fidelityis denedasthedegreeto whih lientdatamatheswithserver's.
Ithasmultiple dimensionsofonsisteny,framerateandimagequalityforvideodataaswellasresolutionfor
spatial data. Building a systemthat allowsdiverse delity levels neessitates typeawareness- lientode is
responsibleforhandlingpartiulardatatypes. Thisisahievedthroughtheuseofwardens,whiharespeialized
odeomponentsthatenapsulatesystemlevelsupportatthelient. WardensaresubordinatetoVieroy,whih
isresponsibleforentralizedresouremanagement.
Odyssey is an exampleof lient based appliation aware adaptation. Rover[13℄ is a system that allows
lient-serveradaptation. This meansthat someode requiredforadaption would alsoresidein server. Rover
uses the onept of Reloatable Dynami Objets (RDOs) for data types handled by the appliation. The
appliationprogrammersplitstheprogramontainingRDOsintothosethatresideonthelientandthosethat
run onservers. This requires that the adaptationode be residenton origin servers. Another approah has
beentakentoavoidthis, namedasproxybasedadaptation. Theadaptationisdonebythe proxy,whih ats
onbehalfoflients. TheBarwanprojet[30℄isanexample. Flexiblelientservermodelforappliationaware
onsisteny, an unbounded onsistenymehanismthat allowsreplias to diverge,but beonsistent after an
unspeiedtime.
3.5. Software Availability and Usage Summary. Globus isawidelyused toolkitandis available as
an open soure software. Stork is a researh prototype, while GriPhyN and CERN havebeen deployed and
used. Akamai'sCDNsarewidelydeployedandused,whileooperativeleases[12℄isaresearhprototype. Coda
andOdysseyarethedistributedmobile systemssoftwarethat arewidelydeployedandused.
4. Large Sale Data ManagementTehniques.
4.1. P2P Data Management. We rst give an overview of P2P le sharing systems starting from
the initial unstrutured P2Psystems suh as Napster to super-peer systems suh asKazaa before disussing
struturedP2Psystems. WegoontodisussP2PstoragemanagementsystemssuhasOeanstore.
4.1.1. P2P File Sharing Systems. P2Pasan areabeamepopularonly after theadvent ofNapster,
a le sharing system. Napster [57℄ was used for sharing musi les. Meta-data about les is stored in a
global diretory, whih is stored in a entralized server. The meta-datastored information about musi les
themselves, whih were downloadedfrom peers. Gnutella [33℄ ame upwith a deentralized searh protool
forlesharingappliations. Gnutellaanbeseento beapurelydeentralizedunstruturedP2Psystem. The
termunstrutured refersto thelakofstruturein theoverlay,whihismostlyarandomgraph. Searhwas
ahieved by ooding the network or by using random walks. Freenet added a mehanism to route requests
to possibleontent loations,based onbest eortsemantis. Freenet also addsa notionof anonymityto the
data shared. Themain advantageof theunstrutured P2Psystemswasthat omplexqueriesould be easily
handled. Byomplexqueries,wemeanqueriessuhasgetallnodeswithproessingspeed
>
3GHzandRAM>
1GBandstorage>
100GB.Thisisbeausethequeryissenttoeahnodeandevaluatedexpliitly. However,deterministiguaranteesforsearhingarediulttoprovideinthesesystems.
Initial attempts at introduing struture to the overlay in P2P systems resulted in super-peer systems,
with some nodes (whih have better apabilities) ating as super-peers. The other nodes at as lients to
the super-peers, whih form aP2Poverlayamong themselves. Super-peers madesearhing moreeient for
omplex queries, by exploiting the heterogeneous nature of nodes (some nodes have better apabilities and
more importantly, better onnetivity than others). An example of a popular super-peer system is Kazaa
(http://www.kazaa.om). However, handling super-peer failures requires repliating super-peers (otherwise
thelientsmay beome disonneted). K-repliasanbe reatedin eahluster,resultingin reduedloadon
thesuper-peers[93℄. However,thismaymakerepliaslientaware. Otherdesignissuesin super-peersystems
inlude lustersize anddynamilayermanagement. Alargelustersize isgood foraggregatebandwidth,but
may reate bottleneks. A small luster size avoids bottleneks, but may redue searh eieny. Dynami
layermanagementallowsnodes to play super-peer orlientnodes adaptively, thereby makingthe super-peer
networkmoreeient[95℄.
The third generation of P2Psystems introdued struture in the overlay network. The motivation ame
from providing deterministi searh guarantees, partitioning the loadover the available mahines eetively,
salingtolargenumbersandahievingfault-tolerane. TheDistributedHashTable(DHT)wasmainlyusedas
thestrutureforoverlayformation. ItwasbasedonthePlaxtondatastruture[23℄. Nodesaregivenidentiers
(ids)fromanidspae. Appliationobjetsarealsogivenidsfromthesamespae. TheDHTprovidesamapping
from theappliation objetid (key) tothe node id that is responsiblefor that key. Eah nodehasa routing
table onsisting of neighbours and performs routing funtions to lookup objets. Various DHTs have been
proposed, eah having dierentroutingalgorithms androutingtable maintenane. Geometriinterpretations
ofDHTshavebeengivenin[45℄(butthefousofthatpaperwasmainlytostudythestatiresilieneofDHTs).
Chord [40℄ is based ona ring, while ContentAddressableNetwork (CAN) is based on ahyperube, Plaxton
datastruture isbasedonatree,whilePastry[69℄ isahybridgeometry ombiningthetreeand thering. We
disusssomeof thesestruturedP2Psystemsin moredetailbelow.
Chord provides the lookup abstration of DHTs throughthe method: lookup(key) whih maps akey to
anoderesponsible for it. Chorduses onsistent hashing to assign m-bitidentiersto bothChord nodes and
appliationobjets. Theidsarearrangedinaringfashion(modulo
2
m
). Akey
k
mapstotherstnodewhoseid isequalto orfollowsk inthe identierspae(this nodeis knownassuessor(
k
)). Eah nodemaintainsapointer to its suessorin the ring. Routingproeeds alongthe ringtill akeyisstraddled betweentwonode
ids, with the seond node id beingthe destination. Eah node also maintainsinformation on
O
(log(
N
))
(fortableweretofail,onlyeienyisaeted, butnotorretness. Aslongaseahnodeisabletoonnettoits
suessor,routingisguaranteedtonishin
O
(log(
N
))
time.CANroutesoverahyperube. EahCANnodestoresahunk(orzone)ofthehashtable. Eahnodealso
stores informationon adjaent zones in the table. This is again to speed uprouting. Lookup requests fora
partiularkeyareroutedtowardsaCANnodewhosezoneontainsthatkey. Requestsareroutedbyorreting
bits (
n
bits fora n-dimensional hyperube). Generally tree based DHTs suh asthe Plaxton data strutureallowbitstobeorretedinorder(fromMSBtoLSBofkey),whilehyperubebasedDHTsallowbitorretion
inanyorder. Thismakesroutingmoreresilientto node/linkfailures.
Pastryanbeviewedashavingahybridgeometryduetoitsuseoftreebasedroutingandringlikeneighbour
formation. Itprovidesarouteabstration toappliations. Theroute(msg, key)ensuresthatthemessagewith
agivenid isroutedtoanodewiththelosestmathingidaskeyamongalllivenodes. Eah nodekeepstrak
ofitsimmediateneighboursinthenodeidspaebymaintainingleafsets. Theyalsostoreinformationabouta
fewothernodesthathaveprexmathingidsintheformofaroutingtable. Pastrytakesintoaountnetwork
loalityinrouting. Thismeansthatagivenmessagewillberoutedtothenearestnodethatisaliveandthathas
thelosestmathingid asthekey. Routingtakesplaebyprexmathing, witheah hoptakingthemessage
onebit loserin thenodeidspae,resultingin
O
(log(
N
))
hops.4.1.2. P2P File Storage Systems. Ivy [56℄ is aread/writeP2P lesystemthat providesanNFS-like
abstration for programmers. Ivy provides NFS-likesemantisin afailure free environment. Under network
partitionsandfailures,Ivyuseslogstoallowappliationstodetetandresolveonits. Ivylogsarespeito
eahpartiipantand host. Thelogs arestoredin DHash, aDHT basedP2Pblok storagesystemoverwhih
Ivyisbuilt. Partiipantsanreadotherlogs,butwriteonlyhis/herlogwhileupdatingthelesystem. Ivyuses
versioningvetorsto detetonitingupdatesandprovidesinformationtoappliation levelonitresolvers.
Ivysystemdemonstratedaperformanewithin2-3fatorofNFSperformaneinaWAN testbed.
PAST[15℄isanInternetbasedP2Pstorageutility. Itoerspersistentstorageservies,availability,seurity
and salability. PAST providesinsert, relaimand retrieve operationsonles. Sine aleannot beinserted
multiple times, les are assumed to be immutablein PAST. It must be noted that PAST is an extension of
Pastrytoprovidealestoragesystem. OninsertionofaleintoPAST,theleisroutedbyPastrytok-nodes
with losestmathing ids asthele id and thatare alive. Theset
k
will be diversewith respet to loation,apabilities and onnetivity due to the randomization of the identier spae. File availability is ensured as
longasall
k
nodesdonotfailsimultaneously. Itprovidesseurityusingoptionalsmartardsthatarebasedonapubli-keyryptosystem.
Oeanstore[49℄is anInternet basedlesystemthat providespersisteneand availabilityoflesby using
atwo-tieredsystem. Theuppertieronsistsof apablemahines withgood onnetivity. Thesemahines at
asaninnerirleofserversforserializingupdates. Thelowertieronsistsoflessapablemahineswhihonly
provide storageresoures to the system. Pond [67℄ is an Oeanstorerealization that provides fault tolerant
durable storageto appliations. It useserasure oding to storedata. Erasure oding [20℄ isatehnique that
allowsablok to be split into m fragments, whih are enoded into n fragments(
n > m
). The keypropertyoferasureodingisthat itensuresthat theblokanbereonstrutedfrom anym ofthen oded fragments.
OeanstoreusesTapestry[17℄,anotherDHT,tostoretheerasureodedfragments(basedonfragmentnumber
+blokid). Oeanstoreusesprimaryopyrepliationtoensureonsistenyoflebloks. Ithandlesread/write
databyaversioningmehanisminwhihanywriteoperationreatesanewversionofthedata. Theproblem
isthenreduedtooneofndingthemostreentversionofthele.
4.1.3. Observations. Ivyhasthedisadvantagethat itleaveswriteonitresolutionto theappliation,
limiting thesalability. PAST provides apersistentahing and storagemanagementlayeron topof Pastry.
Itprovidesinsert, lookup andrelaimoperationsonles. However,italsoassumeslesareimmutable,asles
annot be insertedmultiple times withthe sameid. Oeanstore'sversioning mehanismhasnotbeenproved
salable. Theevaluations on Oeanstore and Pond [67℄ havenot onsidered oniting write operations and
haveassumedthere is asinglewrite perdata blok. Moreover,Oeanstoreassumesan innerirle of reliable
serversto ensureonsisteny. Further, all thethree storagesystems (Ivy, PAST and Oeanstore) havebeen
built over DHTs. DHTs provide support for only limited queries(exat mathing kind) and may not allow
appliationspeiriterionfordataplaement. Inthewordsof[47℄,virtualization(through DHTs)destroys
4.2. P2P Extensionsto DDBMS. Asimplistiviewofatraditionaldistributeddatabasemanagement
systemisthatitusesaentralizedservertoprovideaglobalshemaandACIDpropertiesthroughtransations.
Several approahes have extended these tehniques to work in a deentralized manner, to apply to Internet
or P2P systems. Ative XML [9℄ provides dynami XML douments over web servies for distributed data
integration. It is a model for repliating (whole le) and distributing (parts of a le) XML douments by
introduing loation aware queries in X-Path and X-Query. It also provides a framework by whih peers
perform deentralized query proessing in the presene of distribution and repliation. It allows peers to
optimizeloalizedqueryevaluationosts,byaseriesofrepliationsteps.
Edutella[58℄ attemptsto designandimplementashemabasedP2Pinfrastrutureforthesemantiweb.
It usesW3C standardsRDFand RDF Shema astheshemalanguageto annotate resouresonthe web. It
usesRDF-QEL asan expressivequeryexhange languageto retrievethedata stored in theP2P network. It
usessuper-peerroutingindiesthat inludeshemaandotherindex information.
Piazza[83℄isapeerdatamanagementsystemthatfailitatesdeentralizedsharingofheterogeneousdata.
Eahpeerontributesshemas,mappings,dataand/oromputation. Piazzaprovidesqueryanswering
apabil-itiesoveradistributedolletionofloalshemasandpairwisemappingsbetweenthem. Itessentiallyprovides
ashemamediationmehanismfordataintegrationoveraP2Psystem.
P2PInformationExhangeandRetrieval(PIER)[38℄isaP2PqueryengineforqueryproessinginInternet
saledistributed systems. PIERprovidesamehanismfor salablesharingandqueryingofngerprint
infor-mation,usedin networkmonitoringappliations suh asintrusiondetetion. It providesbest eortresults,as
ahievingACIDpropertiesmaybediultin Internet salesystems. Thequeryenginedoesnotassumedata
isloaded intodatabasesonallpeers, butis availablein theirnaturalhabitats in lesystems. PIERisrealized
overCAN,thehyperubebasedP2Psystem.
PeerDB[60℄isanobjetmanagementsystemthatprovidessophistiatedsearhingapabilities. PeerDBis
realizedoverBestPeer[59℄,whihprovidesP2P enablingtehnologies. PeerDB anbeviewedasanetwork of
loal databasesonpeers. It allowsdatasharingwithoutaglobal shemabyusingmeta-dataforeahrelation
and attributes. The queryproeeds in twophases: in the rst phase, relationsthat math the user'ssearh
arereturnedbysearhingonneighbours. After theuserseletsthedesiredrelations,theseond phasebegins,
wherequeriesarediretedto nodesontainingtheseletedrelations. Mobileagentsaredispathedtoperform
thequeriesinbothphases.
4.3. Software Availability and Usage Summary. Gnutellaand Napster havebeen widely deployed
andused. Chordisaresearhprototypethatisalsoavailableasanopensouresoftware. Pastryisalsoavailable
asanopensouresoftwareandhasalsobeenused widely. CANand Ivyareresearhprototypesaboutwhih
deploymentinformationisnotavailable.PASTandOeanstoreareresearhprototypesthathavebeendeployed
andusedin thePlanetlab testbed.
Edutella is available as an open soure software. The authors do not have information on the
deploy-ment/availabilityon other researhprototypesPiazza, PeerDB and AtiveXML. PIER hasbeen deployedin
thePlanetlabtestbed.
5. State ofthe Art Data Management.
5.1. SID Tehniques: State of the Art.
5.1.1. P2P Tehniques in Grids. JuxMem[2℄providesadatasharingservieforgrids by integrating
DSM onepts with P2P systems. It is realized over (Juxtapose) JXTA [34℄, an emerging framework for
developingP2P appliations. JuxMem uses luster advertisementsto advertise the amount of memory eah
peer anprovideto the global storage. It is organizedinto a federation of lusters, with eah luster having
aClusterManager(CM). TheCM isresponsiblefor storingallluster advertisementsin itsgroup. TheCMs
arosslustersform aDHT. Atually, theamountofmemoryprovidedin theluster advertisementis hashed
andtheCM withthelosestmathingidintheDHTstoresthisadvertisement. Whenalientasksforablok
of memory with a given rounded size (xed sized bloksan only besupported), the size is hashed and the
luster advertisement whih provides that size is retrieved from the CM with the losest mathing id. The
lusteradvertisementhasthedetailsoftheatualstorageprovider. ReentextensionstoJuxMem[14℄provide
mehanismstodeoupleonsistenyprotoolsfromfault-toleranemehanisms. Thisallowstheuseofstandard
DSM onsisteny protools to integrate fault-tolerane omponents. In partiular, DSM onsisteny shemes
wellasanatomimultiastprotool,whihisahievedbyusingonsensusprotoolsbasedonFailureDetetors
(FDs)[26℄. ThedatasharingmehanismsofJuxMemhaveonlybeenevaluated atthelusterlevel.
The replia loation problem has been addressed in grids using P2P onepts in [5℄. It proposes a P2P
realizationoftheRepliaLoationServie(RLS),akeyomponentofdatagrids. TheLogialFileName(LFN)
ishashed togivetheidentierforareplia. Thenodewith thelosestmathingid astheLFN hashontains
theLFN to Physial FileName(PFN) mapping. This isthemeta-datastored in RLSforle lookup. It also
proposes an update protool to handle onsistenyof meta-data. The RLSrealization is based on Kademlia
[63℄. Kademlia isa struturedP2P systemthat uses anovelXOR metrifor routingdistanebetween two
nodesis dened as theeXlusiveOR(XOR) oftheirnumeri ids. A Kademlianodeforms log(n)neighbours,
whereneighbour
i
isatXORdistane[2
i
,
2
i+1
]
. TheneighboursetissameasthatformedbyatreebasedDHT
PRR [23℄. Eventhefailure-freeroutingin Kademlia issimilar to PRR,in that bits areorretedfrom left to
right. However,in theaseoffailures, XORmetriallowsbitstobeorretedin anyorder. This impliesthat
thestatiresiliene 2
ofKademliais betteromparedtoPRR[45℄.
5.1.2. Replia Plaement in CDNs. Optimal plaementofreplias inCDNs isanon-trivialtaskand
has not been addressed. QoS aware replia plaement was proposed in [92℄ to meet QoS requirements of
lientswith theobjetiveofminimizing therepliationost. Therepliationost inludesostof storageand
onsistenymanagement,whileQoSisspeiedintermsofdistanemetrissuhashopount. Twoproblems
areformulated: Replia-awareandReplia-blind. Inreplia-awaremodel, theCDN serversareawareofwhere
objetrepliasarestoredin theCDN network. Thishelpstheserverstorediretlientrequeststo thenearest
replia. In thereplia blind model, appliation ornetwork levelrouting ensureslient requests arerouted to
CDN servers, with serversbeing transparent to replia loation. Eah replia (CDN server) serves requests
omingto it. Dynami programming tehniques areused to arriveat near optimal solutions forthe optimal
repliaplaementproblem,whihisshowntobeNP-omplete.
5.1.3. Distributed Mobile Storage System. Segank [80℄ providesan abstrationof asharedstorage
systemforheterogeneousstorageelements. Themotivationwasthattraditionalmehanismsformanagingdata
indistributedmobileenvironmentssuhasCodaandBayou,havetimeonsumingmergeoperations. InCoda,
updates are released to the server before beoming visible on lients. If serversare physially far away, this
ouldinreasethetime afterwhih updatesbeomevisible. Bayouusesfull repliation,leadingto potentially
expensivemergeoperations. Segankhandlesdataloationproblemwhendataouldbeloatedonanysubsetof
devies,byusingaloationandtopologysensitivemultiast-like(namedassegankast)operation. Itallowslazy
P2Ppropagationofinvalidationinformationtohandleonsistenyofrepliateddata. Italsousesadistributed
snapshot mehanism to ensure a onsistent image aross all devies for bakup. It must be observed that
Segank uses onlyunstrutured P2P system onepts. This implies that Segank annot provide deterministi
searhguarantees.
5.2. Large SaleData Management: Stateofthe Art. Weshallexplaintheurrentstateoftheart
inP2Pdatamanagementalongfourdiretions: integratingstruturedandunstruturedP2Psystemsproviding
Qualityof Servie(QoS) guaranteesin P2P systems,omposableonsistenyfor P2Psystemsandlargesale
DHTdeployment. Wealsoexplainthestateoftheartin P2PDBMS.
5.2.1. Integrating Struturedand UnstruturedP2P Systems. Anattempthasbeenmadein[55℄
to improve strutured P2P systems along three diretions where they were traditionally known to perform
worseomparedtounstruturedP2Psystems: handlinghurn,exploiting heterogeneityandhandlingomplex
queries. InP2Psystems,node/network dynamisresultinginrouting-tableupdates and/ordatamovementis
known as hurn. The paper[55℄ shows that MS Pastry, animplementation of Pastry, anhandle hurnwell
byusing aperiodiroutingtable maintenaneprotool. This protool updatesfailed routingtable entries. It
also has a passive routingtable repairprotool. They demonstrate that by exploiting struture, MS Pastry
anhandlehurnbetterthanunstruturedP2Psystems. HeterogeneityisdiulttohandleinstruturedP2P
systems due to onstraintson data plaement and neighbour seletion. MS Pastry handles heterogeneity in
twoways: onebyusingsuper-peeronepts;seond,bymodifyingneighbourseletionto handleapaity. MS
Pastryisalsoextendedtohandleomplexqueriesbyintroduingnewtehniquesforoodingorrandomwalks.
Floodingisahievedbysendingthemessagetoallnodesintheroutingtable. Randomwalkisahievedbyusing
atag ontainingtheset of nodes to visit,a queueof nodesin theroutingtable rowand abound onnumber
ofrowsto traverse. Afewother eortshavealso beenmadereentlyto makestrutured P2Psystemshandle
rangequeries[16℄, multi-dimensionalqueries[65℄aswellaqueryalgebra[73℄. ASalable WideAreaResoure
Disovery(SWORD)[62℄hasbeenbuilttorealizeresouredisoveryoverWANsbysupportingmulti-attribute
rangequeriesoverDHTs.
Another approah to integrate strutured and unstrutured P2P systems has been made in the Vishwa
omputing grid middleware [53℄. Vishwa uses the task management layerto handle initial task deployment
and load adaptabilityof the tasks. The task management layeris realized using unstrutured P2P onepts
andallowsapabilitybasedresourelustering. ThereongurationlayerofVishwaisrealizedasastrutured
P2P layerand stores information needed to handle node/network failures. The twolayeredarhiteture has
also been used for data managementin Virat [1,7℄. Viratprovides asharedobjet spaeabstration overa
wide-areadistributed system. Virathasbeenextended to areplia managementmiddleware forP2Psystems
[8℄. Theunstruturedlayerformsneighboursbasedonnodeapabilities(intermsofproessingpower,memory
available,storageapaityandloadonditions). AstruturedDHTisbuiltoverthisunstruturedlayerbyusing
theoneptofvirtualnodes. Viratahievesdynamirepliaplaementonnodeswithgivenapabilities,whih
would be veryuseful in omputing/data grids. Detailedperformane omparison is also madewith a replia
mehanismrealized overOpenDHT[68℄, astate oftheart strutured P2Psystem. It hasbeendemonstrated
thatthe99thperentileresponsetimeforViratdoesnotexeed600ms,whereasforOpenDHT,itgoesbeyond
2000msin anInternettestbed.
5.2.2. Composable Consisteny for P2P Systems. A exible onsistenymodel knownas
ompos-ableonsistenysuitableforavarietyofP2Pappliationshasbeenproposedin[72℄. Theauthorshaveinitially
surveyed onsisteny requirements for P2P appliations suh as personal le aess, real time ollaboration
and database or diretory servies. The survey showed that dierent appliations need dierent semantis
for read/write and for replia divergene. The main ontribution of [72℄ is the lassiation of onsisteny
requirementsalong ve orthogonal dimensions: onurrenydegreeof oniting read/write aess; replia
synhronizationdegreeofrepliadivergene;failurehandlingdataaess semantisinthepreseneof
ina-essiblereplias;updatevisibility-timeafterwhihloalupdatesmaybemadegloballyvisible;viewisolation
time after whih remoteupdates must bemade loally visible. A rih olletionof onsistenysemantisfor
shareddataanbeomposedbyombiningtheaboveveoptions. Performanestudieshaveshownthat
om-posableonsistenyintheSwarmsystemoutperformsCoDA[74℄inalesharingsenario,whileforarepliated
BerkeleyDBdatabase,itprovidesdierentonsistenymehanismsfromstrongtotime-based.
5.2.3. Providing QoS Guarantees inP2P Systems. GuaranteeingQualityofServie(QoS)
parame-terssuhasresponsetimeorthroughputinP2Psystemsisahallengingtask. Aninitialattemptwasmadein
[70℄ atusing P2PsystemoneptsforDomain NameSystem(DNS), whihrequires eientdata loation. It
showedthatthoughP2PDNSouldprovidebetterfault-toleranethanonventionalDNS,lookupperformane
ofO(log(N))providedbyDHTswasfarworseomparedtoonventionalDNS.CooperativeDNS(CoDoNS)[89℄
wasproposed to taklethree problemsofonventional DNS:suseptibilityto Denialof Servie(DoS)attaks;
lookupdelays,espeially forashrowds;lakofaheohereny,preventingquikserviereloationin
emer-genies. CoDoNShasbeenproposed asabakwardompatiblereplaementforonventionalDNS.It provides
O(1)lookuptime byusingtheproativeahinglayerofBeehive[88℄. BeehiveenablesDHTsto ahieveO(1)
lookupperformanebyproativerepliation. Traditionally,prexmathingDHTsstoreanappliationobjetat
thelosestmathingnode,witheahroutingstepsuessivelymathingprexes,resultinginO(log(N))lookup
performane. Byaggressivelyahingtheobjetallalongthelookuppath,BeehiveahievesO(1)lookup
per-formaneforthatobjet. Sine,Beehiveassoiatesdierentrepliationlevelsfordierentappliationobjets,
an average lookup performane of O(1) is ahieved. CoDoNS builds a DNS based on a self-organizing P2P
overlayformedarossorganizations(ifeahorganizationanprovideaserverforCoDoNS).CoDoNSassoiates
a domain name with thenode having the losestmathing id asthe domain name's hashed id. If the home
node fails, the node withthe nextbest mathing id takesoverasthe homenode for that partiulardomain.
Performane studies overPlanetLab testbed show that CoDoNS ahieveslowerlookup latenies, an handle
slashdot eets and anquikly disseminateupdates. However, theuseof DHTs asthebasis leavesCoDoNS
vulnerableto networkpartitions. Forexample,ifanorganizationispartitioned from theoutsideworld, while
onventionalDNSwouldensurethat loallookupsworkedorretly,withCoDoNSevenloallookupsmayfail
(DHTlookupmaygooutsidetheloalnetworkevenforloallookupsstrethpropertyofDHTs). Thissuggests
that SkipNets [35℄ may be abetter hoie for realizingDNS thanDHTs. This isbeausedata in SkipNets is
5.2.4. Large Sale Deployment. OpenDHT [68℄ is a publi large sale DHT deployment that allows
lientsto useDHTs withouthaving to deploythem. It provides asharedstoragespae abstrationusing the
getandputprimitives. ThemainmotivationforOpenDHTisthatitishardtodeploylongrunningdistributed
system servies, espeially in the publi domain. OpenDHT is deployed on PlanetLab
(http://www.planet-lab.org/),aglobaltestbedfordeployingplanetarysaleservies. OpenDHTisdeployedoninfrastruturenodes
whih alonepartiipate in DHT routingand storage. Clients onlyuse the storagespae throughtheget and
putinterfaeongateway(infrastruture)nodes. OpenDHTallowsdierentmutuallyuntrustingappliationsto
sharetheDHT.Itensuresthatlientsgetafairshareofstorageresoureswithoutimposingarbitraryquotasa
trade-obetweenfairnessandexibility. ThisisahievedbyassoiatingaTime-to-Live(TTL)withappliation
objetsandletting themexpireiflientsdonotrenewthem. OpenDHT providesstorageabstration ofDHTs
inontrasttothelookupabstrationofChordortheroutingabstrationofPastry.
ItisrealizedoverBambooDHT(bamboo-dht.org),thatissimilartoPastrybuthasdierenesinhandling
node dynamis. OpenDHT is notashared objet spae. The levelof abstration providedto programmer is
dierent. Forinstane,theprogrammerhastotakeareofobjetserialization,RTTI(runtimetypeinferening)
et. torealizeanobjetstorageontopofthebytestoragethatOpenDHTprovides. OpenDHTprovideslimited
onsisteny for the shared byte spae. Conit resolution (for onurrent writes) is left to the appliation,
similarto theBayousystemthat ensureseventualonsisteny,averylooseform ofonsisteny. Butonit
resolutionisanon-trivialtaskfortheappliationprogrammer. TheperformaneofOpenDHT(espeiallyworst
aseresponse time) suers due to thepreseneof stragglersorslow nodes. This has beenimprovedby using
delayawareand iterativeroutingin[71℄.
5.2.5. Stateofthe ArtP2PDDBMS. AtlasP2PArhiteture(APPA)[86℄istheurrentstateofthe
artdata managementsolutionforlargesaleP2Psystems. Itusesathree layeredarhiteture,with theP2P
networkforming thelowest layer. Thislayerouldberealizedusing unstruturedorstrutured orsuper-peer
basedP2Ponepts. Abovethislayer,thebasiP2Pservieslayerisbuilt. ThisprovidesP2Pdatasharingand
retrieving(keybased) in the P2Pnetwork,support for peer ommuniation, support forpeer dynamis(join
andleave)andgroupmembershipmanagement. OverthebasiservieslayeradvanedP2Pdatamanagement
serviessuhasshemamanagement, repliation,queryproessing andseurityare built. The shareddata is
in XML format and queriesexpressed in X-Queriesin order to makeuse of web servies. It is realized over
JXTA. Itprovidesrepliamanagementbyextendingtraditionalentralizedlogbasedreoniliationtehniques
for P2P systems. It assumes the existene of ashared storage spae for distributed reoniliation by peers.
Thisrequiresonsensusprotoolsforrealizationandmaybeexpensive. Ithasnotbeenevaluatedinlargesale
systems.
A reent eort has been made to provide a middleware based data repliation sheme in [94℄ by using
SnapshotIsolation(SI)astheisolationlevel. InSIbasedDBMS,readoperationsofatransationTarehandled
fromasnapshotofthedatabase(setofommittedtransationswhenTstarted). Thisimpliesreadoperations
neveronitwithwriteoperationsandonlywrite-writeonitsanour,resultinginmoreonurrenyand
onsequentlybetterperformane. Ithasbeenproposedat thelusterlevelandmaynotbeappliableforP2P
systemsdue toitsstrongassumption ofatotallyorderedmultiast.
5.3. Software Availability and Usage Summary. Juxmem and Segankare researhprototypes.
De-ploymentinformationonStrutellaisnotavailable.VishwaandViratareresearhprototypesthatareavailable
asopenbinaries. OpenDHThasbeendeployedonthePlanetlabtestbedandisalsoavailableasanopensoure
software. APPAisaresearhprototype.
6. Conlusions. Wehavepresentedasalabilitytaxonomyof datamanagementsolutionsindistributed
systems. WegroupdatamanagementworkdoneinDSMsandsharedobjetspaesin theCentralized/Naively
Distributed(CND)datamanagementategory. TheSophistiated/IntermediateData(SID)management
teh-niques inlude data management in grid omputing systems and data grids as well asContent Distribution
Networks(CDNs)anddatamanagementindistributedmobilesystems. ThesesolutionssalebetterthanCND
tehniques by using distributed data management, instead of entralized approahes. Theyhowever,assume
aninner set ofreliableserverswhih takeareofonsistenyand reliabilityissues. However,in order to take
the data managementservies to theedges ofthe Internet, LargeSale Data (LSD) management tehniques
makeuseofP2Ponepts. Theyonsequentlyprovidebettersalabilityandfault-tolerane,butattheostof
relaxingonsisteny(mostapproahesprovideprobabilistiguaranteesoreventualonsisteny).
Fig.6.1. PitorialRepresentationof SalabilityTaxonomy
It anbeobservedthatLSD tehniquessuhasVirat[8℄handlelargenumberofsmall dataobjets. The
aseofhandlinglargenumberoflargedataobjetsariseswhenexistingdatagridsbeomepurelyP2P,instead
ofusingSIDtehniques. TheexistingLSDtehniquesmaynotworkinthisase,asthesizeofdataobjetsalls
forspeialmehanismstohandlesomeoperationsinludingupdates. Inrementalupdatesorfuntionshipping
inombinationwith LSDdatamanagementtehniquesmayhavetobeexplored.
AnotherinterestingavenueforexplorationistheuseofLSDtehniquesombinedwithnodemobility. The
solutionswhih havebeenproposed for handlingdata managementin distributed mobile systemsdo notuse
P2P onepts, but assume the presene of reliableservers that handle mobile lient requests. When mobile
nodes form the P2P overlay, hurn ould be very high due to node mobility. This, oupled with the devie
onstraints,mayopenupawealth ofresearhquestions.
Optimal data plaement tehniques whih havebeen proposed for CDNs [92℄ anbe used in P2P grids.
Existingdatamanagementtehniquesingrids(orevenP2PgridssuhasP-Grid[46℄)donotaddressoptimal
repliaplaementissues. Thework[8℄providesheuristisforrepliaplaementinP2Pgrids. Butplaementof
replias may notbe exatlyoptimal. Thus, we see thattehniques for datamanagementin oneategoryan
beapplied tootherstoopenupresearhinlargesaledatamanagement.
REFERENCES
[1℄ AVijaySrinivas,MVenkateshwaraReddy,andDJanakiram, DesigningaRepliationServieforLargePeer-to-Peer
DataGrids,IEEEDistributedSystemsOnline,7(2006).
[2℄ GabrielAntoniu,LuBougé, andMathieu Jan, JUXMEM:AnAdaptiveSupportivePlatform forDataSharingon
theGrid,SalableComputing:PratieandExperiene,6(2005),pp.4555.
[3℄ MaartenvanSteenandPhilipHomburgandAndrewS.Tanenbaum,Globe:AWide-AreaDistributedSystem,IEEE
Conurreny,7(1999),pp.7078.
[4℄ PWykoff,SWMLaughry,T JLehman,andDAFord,TSpaes,IBMSystemsJournal,37(1998),pp.454474.
[5℄ A.Chazapis,A.Zissimos, andN.Koziris,APeer-to-Peer Replia Management ServieforHigh-Throughput Grids,in
[6℄ A VijaySrinivasandDJanakiram, AModelforCharaterizing theSalabilityof DistributedSystems,ACMSIGOPS
OperatingSystemsReview,39(2005),pp.6472.
[7℄ AVijaySrinivasandDJanakiram,APeer-to-PeerFrameworkforCollaborativeDataSharingOvertheInternet,Teh.
ReportIITM-CSE-DOS-2005-28,aeptedforpubliationinIEEEInternationalConfereneonCollaborativeComputing:
Networking,AppliationsandWorksharing(CollobarateCom2006),IEEEComputerSoietyPress.
[8℄ AVijaySrinivasandDJanakiram,NodeCapabilityAwareRepliaManagementforPeer-to-PeerGrids,TehnialReport
IITM-CSE-DOS-2006-04,Distributed &ObjetSystemsLab, IndianInstitute of Tehnology,CommuniatedtoIEEE
TransationsonSoftwareEngineering.
[9℄ S.Abiteboul,A.Bonifati,G.Cobéna,I.Manolesu,andT.Milo,DynamiXMLdoumentswithdistributionand
repliation,inSIGMOD'03: Proeedingsofthe2003ACMSIGMODinternationalonfereneonManagementofdata,
NewYork,NY,USA,2003,ACMPress,pp.527538.
[10℄ C. Amza, A.Cox,S. Dwarkadas, P.Keleher, H.Lu,R. Rajamony, W. Yu, andW. Zwaenepoel,TreadMarks:
SharedMemoryComputingonNetworksofWorkstations,IEEEComputer,29(1996),pp.1828.
[11℄ Anoop George Ninan, Purushottam Kulkarni, Prashant Shenoy, Krithi Ramamritham, and Renu Tewari,
Salable ConsistenyMaintenane inContentDistributionNetworks UsingCooperativeLeases, IEEETransationson
KnowledgeandDataEngineering,15(2003),pp.813828.
[12℄ AnoopNinan, PurushottamKulkarni,PrashantShenoy,KrithiRamamritham, andRenuTewari,Cooperative
Leases: Salable Consisteny Maintenane in ContentDistributionNetworks, inWWW'02: Proeedings ofthe 11th
internationalonfereneonWorldWideWeb,NewYork,NY,USA,2002,ACMPress,pp.112.
[13℄ AnthonyD Joseph, Joshua A Tauber, and M Frans Kaashoek,Mobile Computing with the Rover Toolkit,IEEE
TransationsonComputers,46(1997),pp.337352.
[14℄ G.Antoniu,J.-F.Deverge,andS. Monnet,How toBringTogetherFaultToleraneand DataConsistenytoEnable
GridDataSharing,ConurrenyandComputation: PratieandExperiene,17(2006).Toappear.
[15℄ AntonyRowstronandPeterDrushel,StoragemanagementandahinginPAST,alarge-sale,persistentpeer-to-peer
storageutility,inSOSP'01:ProeedingsoftheeighteenthACMsymposiumonOperatingsystemspriniples,NewYork,
NY,USA,2001,ACMPress,pp.188201.
[16℄ ArturAndrzejakandZhihenXu,Salable,EientRangeQueriesforGridInformationServies,inP2P'02:
Proeed-ingsoftheSeondInternationalConfereneonPeer-to-PeerComputing,Washington,DC,USA,2002,IEEEComputer
Soiety,pp.3340.
[17℄ B. Y. Zhao, L. Huang, J.Stribling, S. C. Rhea, A.D. Joseph, andJ. D. Kubiatowiz, Tapestry: AResilient
Global-SaleOverlayforServieDeployment,IEEEJournalonSeletedAreasinCommuniations,22(2004),pp.4153.
[18℄ D.Bauer,P.Hurley, R. Pletka, andM.Waldvogel,Bringingeientadvaned queriestodistributedhashtables,
inLCN'04: Proeedingsof the 29th AnnualIEEE International Conferene onLoal ComputerNetworks (LCN'04),
Washington,DC,USA,2004,IEEEComputerSoiety,pp.614.
[19℄ Bill Allok, Joe Bester, John Bresnahan,Ann L. Chervenak, Ian Foster, Carl Kesselman, Sam Meder,
VeronikaNefedova,DaryQuesnel,andStevenTueke,DataManagementand TransferinHigh-Performane
ComputationalGridEnvironments,ParallelComputing,28(2002),pp.749771.
[20℄ J.Blomer,M.Kalfane,R.Karp,M.Karpinski,M.Luby,andD.Zukerman,Anxor-basederasure-resilientoding
sheme,1995.
[21℄ Brian D Noble, M Satyanarayanan, Dushyanth Narayanan, James Eri Tilton, Jason Flinn, and KevinR.
Walker,AgileAppliation-AwareAdaptationforMobility,inSOSP'97:ProeedingsofthesixteenthACMsymposium
onOperatingSystemsPriniples,NewYork,NY,USA,1997,ACMPress,pp.276287.
[22℄ CGrayandDCheriton,Leases:anEientFault-TolerantMehanismforDistributedFileCaheConsisteny,inSOSP
'89: Proeedings of the twelfthACMsymposiumon Operating systemspriniples,New York, NY,USA,1989, ACM
Press,pp.202210.
[23℄ CGregPlaxton,RajmohanRajaraman,andAndreaW Riha,AessingNearbyCopiesofRepliatedObjetsina
Distributed Environment,inSPAA'97: Proeedingsofthe ninthannualACMsymposiumon ParallelAlgorithmsand
Arhitetures,NewYork,NY,USA,1997,ACMPress,pp.311320.
[24℄ N.CarrieroandD.Gelenter,LindainContext,CommuniationsoftheACM,4(1989),pp.444458.
[25℄ J.B.Carter,DesignoftheMuninDistributedSharedMemorySystem,JournalofParallelandDistributedComputing,29
(1995),pp.219227.
[26℄ T. D. Chandra and S. Toueg,UnreliableFailure Detetors forReliableDistributed Systems,Journal ofthe ACM, 43
(1996),pp.225267.
[27℄ Chervenak,A,Foster,I,Kesselman,C,Salisbury,C,andTueke,S,TheDataGrid: TowardsanArhiteturefor
theDistributedManagementandAnalysisofLargeSientiDatasets,JournalofNetworkandComputerAppliations,
23(2001),pp.187200.
[28℄ U.Dayal,K.Ramamritham,andT. M.Vijayaraman,eds.,Proeedingsof the19thInternationalConfereneonData
Engineering,Marh 5-8,2003,Bangalore,India,IEEEComputerSoiety,2003.
[29℄ DirkD¼llmann andBen Segal,Modelsfor Replia Synhronisation andConsistenyin a DataGrid,inHPDC'01:
Proeedingsof the 10th IEEE International Symposiumon HighPerformaneDistributed Computing(HPDC-10'01),
Washington,DC,USA,2001,IEEEComputerSoiety,p.67.
[30℄ EriA Brewer, RandyH Katz, ElanAmir1, Hari Balakrishnan,YatinChawathe,Armando Fox, StevenD
Gribble,ToddHodes,GiaoNguyen,VenkataNPadmanabhan,MarkStemm,SrinivasanSeshan,Tom
Hen-derson,JoshuaATauber,andMFransKaashoek,ANetworkArhitetureforHeterogeneousMobileComputing,
IEEEPersonalCommuniations,5(1998),pp.824.
[31℄ Ewa Deelman, Carl Kesselman, Gaurang Mehta, Leila Meshkat, Laura Pearlman, Kent Blakburn, Phil
Ehrens,Albert Lazzarini, Roy Williams, andSott Koranda,GriPhyN and LIGO,Buildinga Virtual Data
GridforGravitationalWaveSientists,inProeedingsofthe11thIEEEInternationalSymposiumonHighPerformane
JournalofGridComputing,2(2004),pp.207222.
[33℄ Gnutella,The Gnutella protool speiation v0.4. http://www9.limewire.o m/de velo per /gnu tel la protool 0.4.pdf
2000.
[34℄ L.Gong,JXTA:ANetworkProgramming Environment,IEEEInternetComputing,5(2001),pp.8895.
[35℄ Harvey,NiholasJ.A., Jones,Mihael B.,Saroiu,Stefan, Theimer,Marvin, andWolman,Ale,Skipnet: A
salableoverlaynetworkwithpratialloalityproperties,inProeedingsoftheFourthUSENIXSymposiumonInternet
TehnologiesandSystems(USITS'03),Seattle,UnitedStates,Marh2003,USENIXAssoiation.
[36℄ HenriEBal,M FransKaashoek,andAndrewSTanenbaum,Ora: ALanguageforParallel Programming of
Dis-tributedSystems,IEEETransationsonSoftwareEngineering,18(1992),pp.190205.
[37℄ HenriEBal,RaoulBhoedjang,RutgerHofman, CerielJaobs,KoenLangendoen,TimRuhl,andMFrans
Kaashoek, Performane evaluation of the ora shared-objet system, ACMTransations on Computer Systems, 16
(1998),pp.140.
[38℄ R. Huebsh, J. M. Hellerstein, N. Lanham, B. T. Loo, S. Shenker, andI. Stoia,Querying the Internetwith
PIER.,inVLDB2003,Proeedingsof29thInternationalConfereneonVeryLargeDataBases,September 9-12,2003,
Berlin,Germany,J.C.Freytag,P.C.Lokemann,S.Abiteboul,M.J.Carey,P.G.Selinger,andA.Heuer,eds.,Morgan
Kaufmann,2003,pp.321332.
[39℄ I.FosterandC.Kesselman, Globus: AMetaomputingInfrastrutureToolkit,IntlJournalofSuperomputer
Applia-tions,11(1997),pp.115128.
[40℄ I. Stoia,R. Morris,D.Karger,M.F. Kaashoek,andH.Balakrishnan,Chord: ASalablePeer-to-Peer Lookup
ServieforInternetAppliations,IEEE/ACMTransationsonNetworking,11(2003),pp.1732.
[41℄ L.Iftode,J.P.Singh,andK.Li,SopeConsisteny: aBridgeBetweenReleaseConsistenyandEntryConsisteny,in
SPAA'96:ProeedingsoftheeighthannualACMsymposiumonParallelalgorithmsandarhitetures,NewYork,NY,
USA,1996,ACMPress,pp.277287.
[42℄ J.Frey, T. Tannenbaum,M. Livny, I. Foster, andS. Tueke,Condor-G: AComputation Management Agent for
Multi-InstitutionalGrids,inHPDC'01: Proeedingsofthe10thIEEEInternationalSymposiumonHigh Performane
DistributedComputing(HPDC-10'01),Washington,DC,USA,2001,IEEEComputerSoiety,p.55.
[43℄ JohnDilley,BrueMaggs,JayParikh,HaraldProkop,RameshSitaraman,andBillWeihl,GloballyDistributed
ContentDelivery,IEEEInternetComputing,06(2002),pp.5058.
[44℄ K. L. Johnson, J. F. Carr, M. S. Day, and M. F. Kaashoek, The measured performane of ontent distribution
networks,ComputerCommuniations,24(2001),pp.202206.
[45℄ KGummadi,RGummadi,SGribble,SRatnasamy,SShenker,andI.Stoia,TheImpatofDHTRoutingGeometry
on Resiliene and Proximity, in SIGCOMM '03: Proeedings of the 2003 onferene on Appliations, tehnologies,
arhitetures,andprotoolsforomputerommuniations,NewYork,NY,USA,2003,ACMPress,pp.381394.
[46℄ Karl Aberer, Philippe Cudre-Mauroux, Anwitaman Datta, Zoran Despotovi, Manfred Hauswirth,
Mag-dalenaPuneva,andRomanShmidt,P-Grid: a Self-OrganizingStrutured P2PSystem,ACMSIGMODReord,
32(2003),pp.2933.
[47℄ P.J.Keleher,B.Bhattaharjee,andB.D.Silaghi,AreVirtualizedOverlayNetworksTooMuhofaGoodThing?,
inIPTPS '01: Revised Papers fromthe First International Workshop on Peer-to-Peer Systems, London, UK, 2002,
Springer-Verlag,pp.225231.
[48℄ Kourosh Gharahorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John
Hen-nessy,Memory Consistenyand Event Ordering in Salable Shared-Memory Multiproessors, inISCA '90:
Proeed-ingsofthe 17th annualinternational symposiumonComputerArhiteture,New York, NY,USA,1990,ACMPress,
pp.1526.
[49℄ J.Kubiatowiz, D.Bindel, Y.Chen,S. Czerwinski,P.Eaton, D.Geels,R. Gummadi,S. Rhea, H.Wea
ther-spoon,C.Wells,andB.Zhao,OeanStore: anArhitetureforGlobal-SalePersistentStorage,SIGARCHComputer
ArhitetureNews,28(2000),pp.190201.
[50℄ LGAlexSung,NabeelAhmed,R.andHermanLi,MohamedAliSoliman,andDavidHadaller,ASurveyofData
ManagementinPeer-to-Peer Systems.CS856WebDataManagement,2005. ShoolofComputerSiene,Universityof
Waterloo.
[51℄ LGuy,PKunszt,ELaure,HStokinger,andKStokinger,RepliaManagementinDataGrids.TehnialReport,
GGFWorkingDraft,2002.
[52℄ M.AhamadandR.Kordale, SalableConsistenyProtoolsforDistributedServies,IEEETransationsonParalleland
DistributedSystems,10(1999),pp.888903.
[53℄ M.V.Reddy, A.V.Srinivas,T. Gopinath,andD. Janakiram,Vishwa: AReongurablePeer-to-Peer Middleware
for Grid Computing, in 35th International Conferene on Parallel Proessing, IEEE Computer Soiety Press, 2006,
pp.381390.
[54℄ MahadevSatyanarayanan,AessingInformationonDemandatanyLoation.MobileInformationAess,IEEEPersonal
Communiations,3(1996),pp.2633.
[55℄ MiguelCastro, ManuelCosta,andAntonyRowstron,DebunkingSomeMythsAboutStruturedand Unstrutured
Overlays,inProeedingsofthe2ndUsenixSymposiumonNetworkedSystemDesignandImplementation,Boston,MA,
May2005.
[56℄ A. Muthitaharoen, R. Morris, T. M. Gil, and B. Chen, Ivy: a Read/Write Peer-to-Peer File System, SIGOPS
OperatingSystemsReview,36(2002),pp.3144.
[57℄ NAPSTER,Napstermediasharingsystem.http://www.napster.om
[58℄ W.Nejdl,W.Siberski,andM.Sintek,DesignissuesandhallengesforRDF-andshema-based peer-to-peer systems,
SIGMODReord,32(2003),pp.4146.
[59℄ W. S.Ng, B. C. Ooi, andK.-L. Tan,BestPeer: ASelf-CongurablePeer-to-Peer System.,inProeedingsof the18th
InternationalConfereneonDataEngineering,26February-1Marh2002,SanJose,CA,IEEEComputerSoiety,2002,
etal.[28℄,pp.633644.
[61℄ ObjetManagementGroup,TheCommonObjetRequestBroker:ArhitetureandSpeiation. 2.3.1,Otober1999.
[62℄ Oppenheimer,D.,Albreht,J.,Patterson,D.,andVahdat,A.,DesignandImplementationTradeosforWide-area
ResoureDisovery,in Proeedings.14thIEEEInternationalSymposiumonHighPerformaneDistributedComputing,
2005.HPDC-14,Washington,DC,USA,July2005,IEEEComputerSoiety,pp.113124.
[63℄ Petar Maymounkovand David Mazires,Kademlia: APeer-to-Peer Information System Based onthe XOR Metri,
inIPTPS '01: Revised Papers fromthe First International Workshop on Peer-to-Peer Systems, London, UK, 2002,
Springer-Verlag,pp.5365.
[64℄ PeterJKeleher,TheRelativeImportaneofConurrentWritersandWeakConsistenyModels,inICDCS'96:
Proeed-ingsofthe16thInternationalConfereneonDistributedComputingSystems(ICDCS'96),Washington,DC,USA,1996,
IEEEComputerSoiety,p.91.
[65℄ PrasannaGanesan,BeverlyYang, andHetor Garia-Molina,One TorustoRuleThem All: Multi-Dimensional
QueriesinP2PSystems,inWebDB'04:Proeedingsofthe7thInternationalWorkshopontheWebandDatabases,New
York,NY,USA,2004,ACMPress,pp.1924.
[66℄ M.Raynal,G.Rhia-kime,andM.Ahamad,SerializabletoCausalTransationsforCollaborativeAppliations,in
Pro-eedingsofthe23rdEuromiroConferene,Budapest,Hungary,September1997.
[67℄ S.Rhea,P.Eaton,D.Geels,H.Weatherspoon,B.Zhao,andJ.Kubiatowiz,Pond: TheOeanStorePrototype,
inProeedingsoftheConfereneonFileandStorageTehnologies,USENIXAssoiation,2003.
[68℄ S.Rhea,B. Godfrey,B.Karp,J.Kubiatowiz,S. Ratnasamy,S.Shenker,I. Stoia,andH.Yu,OpenDHT:a
publiDHT servieand itsuses,inSIGCOMM'05: Proeedingsofthe 2005onfereneon Appliations,tehnologies,
arhitetures,andprotoolsforomputerommuniations,NewYork,NY,USA,2005,ACMPress,pp.7384.
[69℄ A.RowstronandP.Drushel,Pastry: Salable, Distributed ObjetLoation andRoutingforLarge-SalePeer-to-Peer
Systems,inProeedingsofthe18th IFIP/ACMInternationalConfereneonDistributedSystemsPlatforms(Midleware
2001),Heidelberg,Germany,November2001,pp.329350.
[70℄ RussCox,AthihaMuthitaharoen,andRobertMorris,ServingDNSUsingaPeer-to-PeerLookupServie,inIPTPS
'01:RevisedPapersfromtheFirstInternationalWorkshoponPeer-to-PeerSystems,London,UK,2002,Springer-Verlag,
pp.155165.
[71℄ S.Rhea,B.G.Chun,J.Kubiatowiz,andS.Shenker,FixingtheEmbarrassingSlownessofOpenDHTonPlanetLab,
inProeedingsofUSENIXWORLDS2005,USENIXAssoiation,2005.
[72℄ SaiSusarlaandJohnCarter,Flexible ConsistenyforWidearea PeerRepliation,inProeedingsofthe25th
Interna-tionalConfereneonDistributedComputingSystems(ICDCS),Washington,DC,USA,2005,IEEEComputerSoiety.
[73℄ K.-U. Sattler, P. Rösh, E. Buhmann, and K. Böhm,A Physial Query Algebra forDHT-based P2PSystems, in
Proeedingsofthe6thWorkshoponDistributedDataandStrutures,Lausanne,Switzerland,July2004.
[74℄ M.Satyanarayanan,J.J.Kistler,P.Kumar, M.E.Okasaki, E.H.Siegel,andD.C. Steere,Coda: AHighly
AvailableFileSystemforaDistributedWorkstationEnvironment,IEEETransationsonComputers,39(1990),pp.447
459.
[75℄ T.Seidmann,RepliatedDistributedSharedMemoryForThe.NETFramework,in Proeedingsof1stInt.WorkshoponC#
and.NETTehnologiesonAlgorithms,ComputerGraphis,Visualization,ComputerVisionandDistributedComputing,
Plzen,CzehRepubli,February2003.
[76℄ StefanSaroiu, KrishnaP Gummadi,Rihard J Dunn, StevenD Gribble, andHenryM.Levy,AnAnalysisof
InternetContentDeliverySystems,SIGOPSOperatingSystemsReview,36(2002),pp.315327.
[77℄ StephanosAndroutsellis-TheotokisandDiomidisSpinellis,ASurveyofPeer-to-PeerContentDistribution
Tehnolo-gies,ACMComputingSurveys,36(2004),pp.335371.
[78℄ H.Stokinger,DistributedDatabaseManagementSystemsandtheDataGrid,inMSS'01:ProeedingsoftheEighteenth
IEEESymposiumonMassStorage Systemsand Tehnologies, Washington,DC,USA,2001,IEEE ComputerSoiety,
p.1.
[79℄ H.Stokinger,A.Samar,K.Holtman,W.E.Allok,I. Foster,andB.Tierney,FileandObjetRepliationin
DataGrids.,ClusterComputing,5(2002),pp.305314.
[80℄ SumeetSobti,NitinGarg,FengzhouZheng,JunwenLai,YileiShao,ChiZhang,ElishaZiskind,Arvind
Krish-namurthy,andRandolphY.Wang,Segank:ADistributedMobileStorageSystem,inFAST'04:Proeedingsofthe
3rdUSENIXConfereneonFileandStorageTehnologies,Berkeley,CA,USA,2004,USENIXAssoiation,pp.239252.
[81℄ SunMirosystems, JSJavaSpaesServieSpeiation.
http://java.sun.om/pr odu ts/j ini /2.0 /do /sp es/ htm l/js -spe .h tml2001.
[82℄ Sushant Goel, Hema Sharda, and David Taniar, Atomi Commitment and Resiliene in Grid Database Systems,
InternationalJournalofGridandUtilityComputing,1(2005),pp.4660.
[83℄ I. Tatarinov,Z. Ives, J.Madhavan,A. Halevy, D.Suiu, N.Dalvi, X. Dong,Y. Ka diyska,G. Miklau,and
P.Mork,ThePiazzaPeerDataManagementProjet,SIGMODReord,32(2003).
[84℄ D. B. Terry, K. Petersen, M. Spreitzer, and M. Theimer,The Case forNon-transparent Repliation: Examples
fromBayou.,IEEEDataEngineeringBulletin,21(1998),pp.1220.
[85℄ Tevfik Kosar and Miron Livny, Stork: Making Data Plaement a First Class Citizen in the Grid, in ICDCS '04:
Proeedings of the 24th International Conferene on Distributed Computing Systems (ICDCS'04), Washington, DC,
USA,2004,IEEEComputerSoiety,pp.342349.
[86℄ P.ValduriezandE.Paitti,DataManagement inLarge-SaleP2PSystems.,inVECPAR,M.J.Daydé,J.Dongarra,
V.Hernández,andJ.M.L.M.Palma,eds.,vol.3402ofLetureNotesinComputerSiene,Springer,2004,pp.104118.
[87℄ S. Venugopal, R. Buyya, and K. Ramamohanarao, A Taxonomy of Data Grids for Distributed Data Sharing,
ManagementandProessing,ACMComputingSurveys,(2006). Toappear.
[88℄ Venugopalan Ramasubramanian and Emin G Sirer, Exploiting Power Law Query Distributions for O(1) Lookup
Performane in Peer to Peer Overlays, in Proeedings of the First Symposium on Networked Systems Design and
ofthe 2004 onferene on Appliations,tehnologies, arhitetures,and protoolsfor omputerommuniations, New
York,NY,USA,2004,ACMPress,pp.331342.
[90℄ WeijianFang,Cho-LiWang,andFranisCMLau,OntheDesignofGlobalObjetSpaeforEientMulti-threading
JavaComputingonClusters,ParallelComputing,29(2003),pp.15631587.
[91℄ WeiminYuandAlanCox, Java/DSM:APlatformforHeterogeneousComputing,inACM1997WorkshoponJava for
SieneandEngineeringComputation,June1997.
[92℄ XueyanTangandJianliangXu,QoS-AwareRepliaPlaementforContentDistribution,IEEETransationsonParallel
andDistributedSystems,16(2005),pp.921932.
[93℄ B.YangandH.Garia-Molina,Designingasuper-peernetwork.,inDayaletal.[28℄,pp.4962.
[94℄ YiLin, BettinaKemme,MartaPatino-Martinez,andRiardo Jimenez-Peris,MiddlewareBased DataRepliation
Providing Snapshot Isolation, inSIGMOD '05: Proeedings of the 2005 ACM SIGMODinternational onferene on
Managementofdata,NewYork,NY,USA,2005,ACMPress,pp.419430.
[95℄ L. Zhenyun Zhuang and Member-Yunhao Liu, Dynami Layer Management in Superpeer Arhitetures, IEEE
TransationsParallelandDistributedSystems,16(2005),pp.10781091.
Editedby: ThomasLudwig
Reeived: May25,2006