Using the Domain Name System
Jussi Kangasharju, Keith W. Ross
Institut Eurecom
Sophia Antipolis, France
fkangasha,[email protected]
James W. Roberts
France Telecom { CNET
Issy les Moulineaux, France
Abstract
Inordertoreduceaveragedelayand
band-width usage in theWeb, geographically
dis-persed servers often store copies of popular
objects. Forexample,with network caching,
the origin server stores a master copy of
theobjectandgeographicallydispersedcache
servers pull and store copies of the object.
With sitereplication, objectsstored at
mas-terarereplicatedintosecondarysites. Inthis
paperweproposeanewnetworkapplication,
LocationDataSystem(LDS),that allowsan
arbitrary hostto obtainthe IP addresses of
the servers that store a specied URL. Our
networkingapplicationisanextensiontothe
Domain Name System (DNS), requires only
small changes to the domain name servers,
and can be deployed incrementally. For the
case of network Web caching, we elaborate
on our proposal to allow a cache to (i)
up-dateadistributeddatabasewhenitstoresor
evictsobjects,and(ii)pushobjectstoparent
caches in order to improve delay and
band-widthusage. Forthecaseofmirroredservers,
we showhow aclient canobtainalistof all
servers mirroring all or part of the desired
site. LDSapplied to partially mirroredsites
generatessubstantiallylessDNStraÆc than
LDS applied to caching. Finally, we discuss
howahostcanusethelocationdatainorder
to make intelligentdecisionsaboutwhere to
retrievedesiredobjects.
1 Introduction
Networkcachingofdocumentshasbecome
a standard way of reducing network traÆc
and latency in the Web. Caches are
cur-rently employed in institutional, local,
re-gional and national ISPs. Cache
hierar-chies,createdwhencachesinlower-levelISPs
pointto cachesin higher-levelISPs,are
cur-rentlyprevalentintheInternet[8,9]. Today's
cache hierarchies use static, manually
con-guredpointersto dene the hierarchytree.
Cache hierarchies operate asfollows. When
abrowserrequestsadocument,itsendsa
re-questto aleafcache. Thiscachetheneither
servesthe document (if it is cached) or
for-wardsthedocumenttoitsparentinthe
hier-archy. Theprocessisrepeatedalongastatic
chainofcachesuntiltherootofthehierarchy
isreached. Ifthereisalsoacachemissatthe
root, the root forwards the request directly
to the originserver. A response is returned
alongthecachechaininthereversedirection.
Cooperatingcachesincachehierarchiesoften
useICP(InternetCacheProtocol)toimprove
andenlargethescopeofthesearch[15].
Cachinghierarchieshaveseveralproblems.
First,requestsforlesspopulardocumentswill
experience misses at all caches in the cache
chain. Fordeephierarchies,thesemisseslead
to poorlatency performance [13]; moreover,
ICP can further degrade performance, since
thecache mustwait forareplyfrom all
|theydonotpermitabrowseroracacheto
choosethesubsequentcacheaccordingto
cur-rent topology or traÆc conditions. Manual
optimizations, such as sending requests for
certaintop level domainsto designated
par-ent caches,are possible, butevenwiththese
optimizations,thehierarchyforagivenURL
remainsstatic. Third,caching hierarchiesdo
not allow the chain of caches to extend
be-yond theroot cache;if thereisamissat the
rootserver,therequestisforwardeddirectly
totheoriginserver,andnevertoacachethat
liessomewhereinbetweentherootserverand
theoriginserver. This isa problembecause
a cache nearby the origin might be able to
serve an object much faster than the origin
server,especiallywhentheoriginserverruns
on a slow machine or has a low bandwidth
connection.
Inthispaperweproposeanewcooperative
caching scheme that has the following
fea-tures. (1)Atmosttwoservers(includingthe
originserver)arevisitedintherequestchain;
(2) The chain of caches depends on the
re-questedURLandcanchangedynamicallyas
a function of current network topology and
traÆc conditions; (3) An arbitrary cache in
theInternetcanbequeried,includingacache
that isfarfrom thebrowserbut closeto the
origin server. (4) The scheme can be
in-crementallydeployed with minor changesto
DNSservers. Furthermore, althougha
thor-oughperformance study is still required, we
feel the scheme should leadto a substantial
reductionindelayandnetworktraÆcas
com-paredto traditionalhierarchicalcaching.
Our caching scheme makes use of a new
networkapplication, theLocationData
Sys-tem(LDS),whichwealsodeneinthispaper.
TheLDS,denedasanextensionofDNS,
al-lows an arbitrary host to obtain the IP
ad-dresses of the servers that store a specied
URL.TheLDSisofindependentinterest,and
canbeused forotherapplications, including
choosing the best mirrored site for a given
URL. For the case of network Web caching,
wespecifyhowacacheupdatestheLDS
dis-tributed database when it stores and evicts
objects, and how a cache pushes objects to
parent cachesin orderto improvedelay and
bandwidthusage.
caching signicantlyincreasesthenumberof
DNS messages. In this paper we also show
how LDS can be applied to replicated and
partially replicatedservers. Inthiscase, the
amountofDNSmessagesdoesnotincrease.
Thispaperisorganizedasfollows. In
Sec-tion 2 we dene the LDS. In particular, we
show how DNS can be extended to provide
the LDSservice. In Section 3weshowhow
network cachescanexploit the LDS; in
par-ticular,wediscusshowthecachesupdatethe
LDSdistributeddatabase,andhowcachesat
higherlevelsin thecaching hierarchycanbe
populated. In Section 4we show how
docu-mentandsitereplicationcanexploittheLDS.
InSection 5wediscusshowahostcanmake
routingdecisions based on the results of an
LDS query. In Section 6 we discuss related
research on cooperative caching. Section 7
presents directions for future work and
Sec-tion8concludesthepaper.
2 Location Data System (LDS)
Copies of an object, with each object
ref-erenced by the same URL, are often
avail-able from dierent servers in the
Inter-net. These servers include origin servers,
replicated servers (also called mirrored
sites) and cache servers(also called proxy
servers). The origin server for an object is
the serverat which the objectoriginates; it
alwayscontainsanup-to-datecopyofthe
ob-ject. A replicated server contains copies of
objects that have been placed into it
(typi-cally manually or by pulling); typically,
ob-jectsin replicated serversare up-to-date. A
replicated server may replicate entire Web
sites or may replicate only portions of
vari-ousWebsites. Cacheserversobtaincopiesof
objectsby pullingthem ondemand. In
par-ticular,whenacacheserverreceivesarequest
foranobject,andiftheobjectisnotcached,
thecacheserverretrievestheobjectfrom
an-otherserver(whichmaybetheobject'sorigin
server, areplicated server, oranother cache
server), storesacopy ofthe object,and
thispaperwerefertoallthreetypesofservers
asobject servers.
Itishighlydesirableforahostinthe
Inter-nettobeabletodeterminethelocations(i.e.,
theIPaddresses)ofalltheobjectserversthat
containcopiesofaspeciedURL.In
particu-lar,ahostwouldliketobeabletogivea
net-workapplicationaURLandreceivefromthe
applicationalistofalltheobjectserversthat
containtheURL.Inadditiontoreceivingthe
IPaddressesofalltheserversthatcontainthe
object, it is desirable to receive information
about thefreshness of each copy, e.g., when
each copy was last modied orthe \age" of
theobject(asdened intheHTTP/1.1[2]).
In this section weoutline a new
network-ing application that provides the service of
mapping a URL to a list of object servers
that contain the URL, with each server on
thelist having associated freshness
informa-tion. We refer to this system as the
Loca-tion Data System (LDS). Wehavetaken
aratherpragmaticapproachindesigningthe
LDS. Our principlegoal has been to design
a systemthat can be rapidly deployed with
incremental changes to the existing Internet
infrastructure. A second goal is scalability,
i.e., asystem that is decentralized and
hier-archical. A third goal is performance, that
is,asystemthat quicklyreturnsthelocation
data while injectinga minimumof overhead
traÆcintothenetwork.
Figure 1 shows the basicmapping service
providedbyLDS.Inthegure,theclienton
the left has a URL, it gives it to the LDS
systemwhich returnsalist of objectservers
thatcontaintheobject. Insteadofthewhole
URL,theclientcanaswellgiveonlyaprex
oftheURLtotheLDSblackboxwhichthen
returns a list of object servers that contain
URLs matching the prex. In the simplest
case, the prex is only the hostname of the
URL;inthiscase,theLDSserviceisidentical
to theDomainName System,operationalin
theInternetforover20years.
Given the similarities in the service
pro-vided by both DNS and LDS, we have
de-cided to base the design of LDS on DNS.
In fact, LDS can be implemented by
mak-ing minor extensions to BINDname servers
Black
LDS
Box
List of object
servers that
contain object
LDS
URL
Client
Figure1: LDSserviceand to the DNS protocol. Recall that DNS
provides a mapping of hostnames to IP
ad-dresses. (DNScan returnmorethan one IP
address for a hostname for mirrored sites.)
DNSusesahierarchyofnameserversto
im-plement a distributed database of resource
records, whereby a resource record contains
amapping.
We now present an overview of the LDS
design;weprovidemoredetailsin the
subse-quent subsections. As does DNS, LDS uses
a hierarchy of servers to implement a
dis-tributeddatabaseofresourcerecords. We
re-ferto theLDS serversaslocationservers.
EachLDSresourcerecordmapsaURLtoan
objectserver,withassociatedfreshness
infor-mation. EachURLhasone(ormore)
author-itativelocation servers(To conveythe main
idea,we initially assumethat each URLhas
exactly one authoritative server.) The
au-thoritative location server contains a list of
resourcerecords for the URL, that is, a list
ofobjectserversthatcontaintheURL(along
withthefreshness information). Other
loca-tionserversmaycontaincachedcopiesofthe
listof resourcerecords. In ourbasic design,
hostsquerylocationserversinamannerthat
isfully analogousto theDNS protocol. The
sequence of query and reply events is
illus-tratedin Figure2.
In Figure 2, C is the querying host (e.g.,
a browser), O
0
is the origin server for the
queriedobject,machinesO
1 {O
4
areother
ob-jectservers(e.g., caches), andmachinesL
1 {
L
4
arelocationservers.
Thequerying host sends a location query
to its local location server (L
1 ). If L
L
2
C
O
0
L
1
O
1
O
3
O
2
O
4
L
3
L
4
1
8
2
7
9
3
9
6
9
5
4
Figure 2: Sequence of Location Queries and
Replies
sends a query to a root location server
(L
2
). If L
2
does not have the location
in-formation cached, it returns the address of
alocation server responsible for the domain
of the origin server in the query (L
3 ). L
1
sends a new location query to L
3
and if L
3
doesnothavethelocationinformation,it
re-turns the address of the authoritative
loca-tionserver(L
4
). Finally, L
1
queriesthe
au-thoritativelocationserverandreceivesthe
lo-cationinformation. LocationserverL
1 then
sends the information to the querying host
andalsocachestheinformation. (In this
ex-ample we assumed that all queries between
LDSserversareiterative;recursiveand
com-binations of recursiveand iterative can also
be used. Like DNS,LDS isnotbased on
ei-therquerytypebeingused. Wealsoassumed
thatthereisoneintermediatelocationserver
betweentherootandauthoritativeservers;in
practicethere canmoreorless.)
Suppose that thereply indicated that
ob-ject servers O 0 , O 1 , and O 3 contain copies
oftheURL.Thequeryinghostthenchooses
toretrievetheURLfrom oneoftheseobject
servers. Ideally, the hostchooses theobject
serverthatwilldelivertheobjectthefastest.
(We will discuss how this choice might be
madeinSection5.)
Giventhesimilaritiesbetweenourbasic
de-sign of the LDS and the DNS, we now
ad-dress whether the LDS can implemented in
the existing DNS, or within a slightly
ex-leadto rapiddeploymentoftheLDS.
2.1 Resource Records, Query
Messages and Reply Messages
Forlocation requestsanew DNSresource
record type is needed. It is similar to the
standardA-typeresourcerecord,whichmaps
a hostname to an IP address. We now
de-scribethisnewDNSresourcerecordtype.
Each resource record of this type must
contain an object identier. In order to
be compatible with standard DNS queries,
object identiers must be encoded in a
standardized way. Using this encoding,
the query appears like a normal DNS
A-query and can be resolved without
changing any parsing routines in DNS
servers. Figure 3 shows how the URL
http://www.eurecom.fr/~bob/index.html
wouldbeencoded.
t
p
3
w
w
w
7
e
u
r
e
c
o
m
2
f
r
0
4
h
t
m l
5
i
n
d
e
x
4
~
b
o
b
4
h
t
Figure3: RepresentationofaURL
ThisencodingallowsforeÆcient
compres-sionofidentierssharingacommonpart,just
aswhen queryingthe addressesof all of the
serversinagivendomain. Sincetheidentier
canbeviewedasageneralizedhostname,all
thenormalfacilitiesofDNScanbeused.
We mentionthat there is a 255character
limitonhostnamesinDNSquerieswhich
im-plies that URLs longer than 255 characters
will have to be truncated. Also, the length
ofasinglelabel(e.g.,onecomponentof
host-nameorURL path)is limited to 63
charac-ters. We do not expect these limits to be
aproblem. Wealso mention that URLs are
case-sensitive but DNS does not guarantee
case-sensitivetreatment. Nonetheless,wedo
notexpectthistobeamajorissuesincemost
URLs cannot be confused with other URLs
evenifcaseisnotpreserved.
ceededthese limits. Evenifthese 122URLs
were truncated to conform to DNS
limita-tions, we did not observe any collisions
be-tween two dierent URLs. Neither did we
ndanyURLswherecase-insensitivitywould
havepresentedanyproblems. Therefore, we
donotproposeanymethodsforhandlingsuch
URLsbeyondsimpletruncation.
The reply will include several resource
records, one for each object server holding
a copy of the URL. Each of these resource
recordshas atime-to-live(TTL) eld which
species how long that resource record can
be cached at a DNS server. This
TTL-informationcaneitherbespeciedbythe
au-thoritativelocationserveror,forabetter
es-timate, by the object server. If the location
serversetsthisTTL-eld,itwillbethesame
for allresource recordsfrom that server. If,
ontheotherhand,thiseld issetbythe
ob-jectserver,itallowstheobjectserverto
com-municateinformationonhowlongtheobject
islikelyto beavailableat thatserver.
The actual data section of the resource
record (RDATA) contains the IP-address of
theobjectserverandsomefreshness
informa-tionabouttheobject.Thisinformationcould
be for examplethe last modication date of
theobject,whichprovidesaneasywayof
dis-tinguishing between possible stale copies of
theobject. Inthiscase,theauthoritative
lo-cationserverwouldhavetoperiodicallyquery
theoriginservertogetanup-to-date
modi-cationdatefortheoriginalobject.
Anotherpossiblepieceofinformationthat
couldbeincluded in theresourcerecord,
in-stead of the last modication date, is the
\age"oftheobject,as denedinHTTP/1.1.
Anobjectservercoulddeterminethislocally,
and a small age would refer to a relatively
freshcopy. Thismethodwouldnot,however,
oerthesameguaranteesonobjectfreshness
aswouldthelast modicationdate.
Yetanotherpossibilityistoincludesimple
TTL-information, but this would be
redun-dant, sincethe resourcerecord by denition
includesaTTL-eld. ATTL-estimateis
use-fulwhenthequeryinghostdecideswhich
ob-jectserverto use. Objectserverswithshort
cisionmakingprocess.
Because all location servers are allowed
to cache LDS replies, some location servers
may havestale location information in their
caches. Whenthestatusofanobjectchanges
atanobjectserver,theobjectservernoties
the authoritative location server. This
up-dateis not,however,re ectedonthecached
copies of the location information. If stale,
cached locationinformationis deliveredto a
queryinghost,the queryinghostmaydecide
to forwardthe request to a Web cache that
hasevictedtheobject.
Onesolutionistousetheonly-if-cached
directive of HTTP/1.1 when requesting
ob-jectsfrom Web caches. If the distant cache
hasthe objectcached, itwill returnthe
ob-jectfrom the cache; otherwiseit will return
anerrortothequeryinghostindicatingthat
theobjectisnolongercached. Thequerying
host will then choose another object server
from the list. We are currently looking into
waysofreducingtheamountofstalelocation
informationinLDS.
2.2 UpdatingtheLocationServers
Theobjectserversneedto keepthe
infor-mation at the authoritative location server
up-to-date. Forthis we needanewprotocol
that is used to exchange information about
newcopies of objects. Forexample, when a
cache caches a new object, it must send a
messageto the authoritative location server
responsible for this object, indicating that
the object has been cached. This message
shouldalsoincludetheinformationwhichwill
be present in the resource record (e.g., last
modicationdate)aswellasaTTL-estimate.
Similarly,whenthecacheremovesanobject,
itmustsendamessagetothelocationserver
indicatingthattheobjectisnolongercached.
Since the location data is managed over
DNS,thedynamicupdatefacilityofDNS[14]
can be used to dynamically update the
lo-cation information. With this method, it is
desired.
UsingthedynamicupdatefacilityofDNS,
anobjectserversendsanupdatemessage
ev-erytimethestatusoftheobjectchanges. For
a Web cache this change of status could be
thecaching ofanewobject,aremovalofan
objectorcachinganewversionoftheobject
aftertheobjecthasbeenmodiedatthe
ori-ginserver. Inorder toreduce theamountof
update traÆc, object servers can send their
reports in batches. This isespecially
bene-cialfor Web caches that can send the
infor-mationaboutanHTMLpageandallinlined
imagesin onemessage,therebyreducingthe
overheadconsiderably.
2.3 Implementation
Implementing the LDS is relatively
straight-forwardand can be done
incremen-tally. Since the system is an extension to
DNS,nonewserversneedtobecreatedand
introduced.
For a DNS server to be LDS-capable,
it needs to be able to interpret a new
querytypecodeand the associatedresource
records. DNSalreadyhandlesnumerous
dif-ferent query and resource record types, so
addingonemoreissimple. Furthermore,the
authoritative serverhas to beable to
main-tain the location information database and
constructresourcerecordsfromthisdatabase
toincludeinrepliestoqueries.Althoughthis
URLdatabasecontainsmoredatathana
nor-mal DNS database, it can be maintained in
asimilar way. For aWeb cache to be
LDS-capable,itsimplyhasto beableto sendthe
updatemessagesto theauthoritativeserver.
Thearchitectureallowsforincremental
de-ployment,sinceifacontentproviderdoesnot
operate a location server, no LDS resource
recordswillexist. IfnoLDSresourcerecords
are available, the querying host would
for-wardallrequestsusingastaticpolicy,for
ex-ample,forwardingthemdirectlytotheorigin
server or a parent in a cache hierarchy. If
LDS recordsexist,then LDS-enabledclients
normal, static methods for nding objects.
Intheearlyphasesofdeployment,the
tradi-tional hierarchyshould bekeptasafallback
forclientsandserversthat donotimplement
thenewprotocols.
3 Network Web Caching
In this section we propose a new
cooper-ativecachingschemethat exploits theLDS.
We assume that browsers forward their
re-quests through aproxy cache in addition to
thebrowser cache. Thismethod is currently
widely used by ISPs and other institutions,
suchascompaniesanduniversities,tooera
betterservicetotheirclientsaswellasto
re-ducethe out-goingtraÆc. Also, ifa rewall
isused, thenall requestsmust go througha
proxy at the rewall in any case, so adding
a cache there does not present a signicant
overhead.
We now describe our cooperative caching
scheme. A browser rst sends anHTTP
re-questforanobjecttoitsproxycache. Ifthe
proxy cache does not have the object, the
proxy cache invokes LDS to obtain alist of
allthe object servers(i.e., originserverand
someothercachesin the network) that
con-taintheobject. Theproxycachethenchooses
the\best"objectserverfromthelistand
for-wardstheHTTPrequesttothisobjectserver
(seeSection5). Theproxyreceivestheobject
fromthebestobjectserverandforwardsthe
objecttothebrowser.
Ourcachingschemehas several appealing
features. First,atmosttwoserversarevisited
to retrieve an object. Standard hierarchical
cachingschemes(suchasNLANR[8]and
Re-nater[9])cancauserequeststopassthrough
alargenumberofobjectserversbeforeacopy
oftheobjectisfound,which canseverely
in-creaseobjectretrievaltime[13]. Second,the
schemeallowstheproxyservertochoosefrom
alltheobjectserversthatcontaintheobject.
Letuslookatacoupleofscenarios:
andalltheother objectserversonmore
distant ISPs. In this case, the proxy
server would choose the cache in the
neighboringISP.
LDSreturnsalistofobjectservers,with
onecacheserveronthesamecontinentas
theproxyserver,andalltheotherobject
serversontheother sideoftransoceanic
links. In this case, the proxy server
choosesthecacheinitscontinent.
LDSreturnsalistofobjectservers,with
alltheobjectserversfarfrom theproxy
and near or in the origin server's ISP.
In this case, the proxy may still prefer
tochooseoneofthecachesoverthe
ori-ginserver: Theoriginservermayrunon
a slow machine orhave a slow network
connection.
Itisimportanttonotethattraditional
hierar-chicalcachingdoesnotpermitthelastoption.
Withhierarchicalcaching,ifthereisamissat
therootcache,therootcacheforwardsthe
re-questdirectlytotheoriginserver. Thus our
scheme enables the proxy to select from all
caches that are on the \route" between the
proxyandtheoriginserver.
Ourcachingscheme doeshaveanunusual
peculiarity. Intheschemejustdescribed,the
proxy server will always retrieve an object
either from the origin server or from some
other proxy server. Because all the proxy
servers are near the bottom of the Internet
hierarchy(in institutionalISPsorlocal
resi-dentialISPs),withtheexceptionoftheorigin
server,all copiesof an object arestored,
es-sentially,intheleavesoftheInternet. There
are, however, several compelling reasons to
also store copies of objects higher upin the
Internet hierarchy, in particular in regional
and national ISPcaches. First,the
request-ing proxy may be ableto retrieve anobject
morequicklyifitisavailablefromacommon
parent(orgrandparent)ofsomeother proxy
cache that has the object. Second,
hierar-chicalcachingprovidesanasynchronous
mul-ticast infrastructure, which can signicantly
reducebandwidthusageintheInternet[10].
Giventhatcacheserversarepresentat
re-gional and national caches into hierarchies,
wenowmodifyourcachingschemetoexploit
thehigher-levelcaches.
3.1 Populating Caches
Inordertoexploit thehigher-levelcaches,
we must make sure that the higher level
caches obtain copies of the objects cached
at lowerlevels. We propose that lower level
cachespushobjectstotheirhigherlevel
par-ents. This way the objectsare easily
avail-ableforsiblings under the sameparent,and
requestsfromdistanthostscanbesatisedat
ahigherlevelinthenetwork.
The details of our pushing strategy are
as follows. A low level cache periodically
sends a list of new objects (with last
modi-cationdates) to itsparent. Theparent
de-cideswhichoftheobjectsinthelisttocache
andretrievesthesefromthechildcache. This
is done at every level in the hierarchy and,
in the end, the caches are lled up as they
would have been using traditional
hierarchi-calcaching. Cachesatthelowestlevelshould
send informationaboutevery object, but at
higherlevelsacache mightuse somelocally
denedpolicy(e.g.,objectsthatarecachedin
multiplechildcaches)todecidewhichobjects
toreportto itsparent.
3.2 Network TraÆc
Since LDS increases the number of
mes-sagessentoverthe network,itmayoverload
somelinksorserversandinfactdegrade
per-formance. We are currently performing an
analysis of how much LDS increases current
networktraÆc. Thefollowingarethekey
fac-tors:
1. LDS queries and replies: Although
thesearenormalDNSmessages,anLDS
query is sent every time a proxy needs
toretrieveaURLthatitisdoesn'thave
cached. Caching LDS recordsat the
ordinaryDNS,since aURLreference is
morespecicthanahostnamereference.
2. Update traÆc: Whenthestatusofan
objectchangesinacache,thecachemust
send an update messageto theobject's
authoritativelocationserver. For
popu-lar, widelycached objects, this may
re-sult in too much traÆc directed at the
authoritativeserver.
In order to reduce the amount of LDS
lookup traÆc in the Internet, we propose a
variation to our basic LDS scheme. In this
variation, once we getthe location
informa-tion for an HTML-page we assume that all
theinlinedobjectsonthepagearealso
avail-ablefromthesamesetofobjectservers. One
advantage of this scheme is that it heavily
cutsdownontheamountofLDStraÆc
com-paredtothatbasicschemesincewenowonly
send one LDS query for each HTML-page
instead of each URL. The disadvantage of
thevariationis that there arenoguarantees
thattheinlinedobjectsareavailablefromthe
sameobjectservers. Originserversand
repli-cated servers should have the objectsbut a
cachemayhavealreadypurgedsomeorallof
them. Ifthe objectserverdoesnothavethe
requested object, we will do an LDS query
forthat objecttoobtainup-to-date location
information.
InSection4wewillshowhowtheamount
of LDS traÆc can be signicantly reduced
when the LDS system is restricted to
repli-catedservers.
To address the issue of update traÆc, we
proposethat anycachethat hasaparent
re-frain from sending update information.
In-stead, the update information is forwarded
totheparentduringthepushingphase. The
highestparentthat doesnot havea copy of
the object sends to the authoritative server
an update messagethat containsupdate
in-formationforitself anditsupdatedchildren.
Thebenetisthattherearesignicantlyless
higherlevelcachesthanlowlevelcaches,thus
much lessupdate traÆc.
Intheabovescheme,thehighestparentin
the hierarchy sends update messages for all
itschildrenwhicharethusallincludedinthe
databaseatthe locationserver. As aresult,
institutional caches would also be included
in the list returned by the location server,
whichmeansthattheycouldreceiverequests
for cached objects from hosts outside their
own network. It is desirableto keepthe
in-stitutional caches othe list oflocations for
tworeasons. First,institutions likely donot
wantoutsidehosts using theirbandwidthto
retrieve objects. Second, objectsat
institu-tional caches are likely to be cached in the
parentcachesand outsidehosts canretrieve
objectsfasterfromtheparentcache.
Cache digests [11] are used in traditional
hierarchies to represent cache contents in a
compressedform. Acachefetchesthedigests
ofitsneighborcachesandwhenarequest
can-notbesatisedlocally, the cache checks the
digestsoftheneighborcachestoseewhether
anyofthemhavetheobjectcached. Ifso,the
objectisretrievedfromthatcache;otherwise
therequestisforwardedtotheparentcache.
In LDS-based caching, having the digest of
the parent cache is useful because this way
weavoidtheLDS-lookupforobjectsthatare
cached at theparentand would in any case
beretrievedfrom there.
4 Replicated Servers
TheLDSschemeisnotlimitedtocaching.
Itcanalsobeusedtodisseminateinformation
aboutreplicated ormirroredobjects. When
an object is replicated at another server,
this information is added into the location
databaseattheauthoritativelocationserver.
Whenaclientrequeststhisreplicatedobject,
it doesan LDS lookup and gets a list of all
serversholdingacopyoftheobject. Because
theinformationisnotdynamicallyreplicated
asincaching,butratherplacedatwell-chosen
locations,thereisrarelyneedtonotifythe
line.
With LDS, a user addresses a replicated
objectbythe object'suniqueURL,LDS
re-turns the list of servers that have the
ob-ject. Thebrowserthentransparentlychooses
one of these (the best in some sense). The
user would not have to know about the
ac-tualphysicallocationoftheobject.
4.1 Reducing LDS Query TraÆc
The scheme just described for replicated
serversstillrequiresthatclientssendanLDS
lookupforeachURL.Inorder toreduce the
numberofLDSmessagesweproposethe
fol-lowingmethodofreplicatingserversand
stor-ing information about the replicated URLs.
In this scheme, the LDS no longer keeps
track ofcachedobjects. It onlytracks
repli-catedandpartiallyreplicatedservers. We
re-quire that replicated serversreplicate either
whole sites orcomplete sub-treesof servers.
An example sub-tree of an origin server is
all the objects on the origin server below
a given URL prex, e.g., all objects under
http://cnn.com/sports/.
Suppose that a client wants to retrieve a
URL. Normally, a client would send a DNS
lookuptoobtaintheIPaddressoftheorigin
serverandretrievetheobjectfromthere.
Us-ingLDStheclientproceedsasfollows. First,
the client sends an LDS queryfor the
host-nameintheURL.TheLDSsystemreturnsa
listofallserversthatmirroreitherthewhole
siteoracompletesub-treefromthesite. We
need to extend the resource record dened
in Section 2.1 to include a prex indicating
the sub-tree that is mirrored by that
par-ticular server. The client then chooses the
\best" serveramong those serversthat
mir-rorthedesiredsub-treeandforwardsthe
ac-tual HTTP-request to this server. A server
may mirror multiple sub-trees from a single
originserver; in this case LDS would return
oneresourcerecordforeachsub-tree.
Comparedtothecachingschemediscussed
in Sections 2 and 3, this scheme has
sev-onlyone LDS lookupfor each server;in the
cachingscheme,theclientmustsendanLDS
lookup for each URL. The caching scheme
drastically increases DNS traÆc in the
net-work whilethemirroring schemekeepsDNS
traÆc at its current level. (Recall that a
client would in any case have to do a DNS
lookup to get the IP address of the
ori-gin server.) Second, objects at mirrored
serverstendtostayatthemirroredserversfor
longperiods of time while objects in caches
are cached and purged dynamically. Hence,
LDS information for a mirrored server can
be cached longer at location servers which
greatly reduces lookup traÆc. When the
authoritative LDS server sends information
about a mirrored site, it should rst send
outinformationaboutmirrorsthatmirrorthe
mostofthesite. ThisisbecauseDNSreplies
are by default sent over UDP and the
mes-sagesize is severely limited (512 bytes). By
sendinginformation aboutmirrorsthat
mir-rorlargesubtrees,wemaximizethelikelihood
thattheclienthasthenecessaryinformation
inthereply.
ComparedtothestandardDNS-wayof
ac-cessing objects, our mirroring scheme does
notincreasethenumberof DNSmessagesin
thenetwork. Thereplymessagesareslightly
larger,however,sinceweneedtoindicatethe
prexin theresourcerecord. We donot
ex-pectthisincreaseinsizetobesignicantsince
URLprexescanbeeÆcientlyencodedusing
theencodingspeciedinSection2.1andDNS
resourcerecordpointers.
5 Routing Decision
Whenthecachehasreceivedareplyto its
LDS-lookupcontainingseveralobjectservers
(cachesandoriginservers),thenextquestion
is: \Whichoneofthepossibilitiestochoose?"
Determiningthebestalternativeisan
impor-tanttopicofourongoingresearch.
Possiblesolutionsinclude:
PassQoSinformationalong in LDS
up-datesandreplies.
Use explicit routing information from
BGP.
We will now provide some details on how
thesemethodscouldbeused.
Intherstoption, allqueryinghosts(e.g.
low level Web caches) measure the time it
takestofetchobjectsfromotherservers(Web
serversorothercaches). Thisdownloadtime
gives a crude estimate of the actual
band-width between the two hosts. The
query-inghostkeepsalistof serverswithwhich it
has had good connections and prefers those
serversto otherswhen bothtypes ofservers
arepresentin theLDSreply. Ofcourse, the
actualnetworkconditionschangeallthetime,
but the results presented in [7] on
perfor-mance characteristics of mirrorservers
indi-cate,that from alarge setof mirrorservers,
onlyasmallnumberneedtobeconsideredas
candidatesfordownload. Althoughthestudy
in[7]usesaxednumberofmirrorsites,we
believethattheresultscanbeappliedto
sit-uationswhere thenumberofserverschanges
dynamically.
Thesecond optionistopasssomeQoS
in-formationin the LDS repliesalong with the
age information about the object. For
ex-ample, a Web server can measure the
con-nection times of incoming connections,
esti-mate theconnectionbandwidthand takean
averageoftheestimatedbandwidthsoverall
connections. Thisaveragebandwidthcanbe
seenas thebandwidth that a randomclient
somewhere in the network could expect to
getwhenrequestingobjectsfromthisserver.
Likewise,cachescanmeasurethebandwidths
ofallout-goingconnectionsandcalculatethe
averagebandwidth.
The third option uses explicit routing
in-formation obtained overBGP by talking to
localrouters. Thisinformationgivesthehost
atopological map of the network which can
beusedtondouthowfareachoftheservers
intheLDSreplyare. Unfortunately,routing
informationgivesonlyreachability,not
qual-al.[4]havestudiedusingroutinginformation
in request routingand have decided against
usingBGPinformationbecauseof
congura-tionandsecurityissues,amountofdataand
diÆculty of obtaining it. Their approach is
basedonusingwhois-services.
Regardless of the approach chosen, any
queryinghostcan implement any local
poli-cies necessary when deciding where to
for-ward arequest. For example, acache
oper-ator may want to forward requests to other
caches based on the domain of the origin
server.
6 Related Research
In[3]Gadde etal. comparetraditional
hi-erarchicalcachingwithanarchitecturewhere
asingle centralized serverholds information
aboutwhereeachdocumentiscached. When
a low level cache wants to nd out where
an object is cached, it sends a message to
this central mapping server which responds
by either redirecting the request to a peer
serverorbyforwardingittotheoriginserver.
This scheme resultsin allof the objects
be-ing cached at institutional caches. The
au-thors alsoperformsimulationsusingasmall
numberofcachesand nd outthat the
per-formanceofthecentralizedsolutionison
av-erage better than that of the traditional
hi-erarchy. The numberof caches in the
simu-lations is low, only 8 and 32 cache
congu-rations are simulated. The authors express
theirdoubtsaboutthescalabilityoftheir
so-lution,butpresentsomeideasforreplicating
anddistributingthelocationinformation.
Inour approach, the queryinghostgets a
listofallservers holdingacopyoftheobject
insteadofbeingredirectedtooneofthembya
mappingserver. Thisputs thequeryinghost
incontrol overwheretherequestwillbe
for-warded. This decisionmight greatlydepend
onlocal conditionsand policies unknown to
a central server. Another dierence in our
approach is that the location information is
where data is cached near the browser and
location hints are passed through a
meta-data hierarchy. Their architecture arranges
thecachesinvirtualhierarchies,oneforeach
object, for distributing the metadata
infor-mation. In their architecture, when acache
cachesacopyofanobject,itsendsouta
loca-tionhintindicatingtheURLanditsaddress.
This hint ispropagated in the virtual
meta-datahierarchyandcouldeventuallyreachall
caches in the system. Caches keep track of
allthehintstheyhavereceivedandusethem
toforwardrequests. If acachereceivesa
re-questforanobjectwhichisnotcachedlocally
but thehintsindicate another cache holding
the object, the request is forwarded to this
cache. Thevirtual metadatahierarchies are
constructedusing IP-addressesandURLs in
away which tries to minimize the distances
betweenparentsandchildren. Also,the
hier-archyguaranteesthatacachecanreceiveonly
onehintforanyobject. Thishintislikelyto
befromthenearestcacheholdingtheobject,
butthisisnotguaranteed.
One dierence between their and our
ap-proaches is the way caches are placed. In
their architecture only institutional caches
holdobjectsandeveryrequestforwarded
us-ingahintwouldbeforwardedtoanother
in-stitutionalcache. Thisbehaviormightbe
un-acceptable to some people who do notwant
more external traÆc on their networks. In
our approach, objects are pushed to higher
levels andthis eliminates theneed for going
to other institutionalcaches. Second
impor-tantdierenceisthatusingLDSforlocating
objects,aqueryinghostgetsalistofall
pos-sible locations. It can then choose from the
list accordingto alocal policy orby
adapt-ing to current network conditions. In [13] a
cache receivesonly one location hint. Since
the virtual metadatatrees are based on
IP-addressesandURLs,therearenoguarantees
thatthishintindicatesagoodserver. Weare
currentlyperformingacomparisonofthetwo
schemes.
Amiret al.[1]study three dierent
meth-odsforredirectingrequeststomirroredsites{
HTTPredirection,DNSroundtriptime
mea-surements,and sharedIPaddresses. Intheir
DNS-basedsolutiontheyuseDNSroundtrip
replicated server for a client. They place
the replicated servers near an authoritative
DNS server which gives the address of the
replicatedserverwhenqueried fortheorigin
server.Thisresultsinclientsbeingeventually
redirected to their closest replicated server.
Therearetwomajordierencesbetweentheir
approachand ourreplicated serverapproach
(Section4.1). First,intheirsystemreplicated
serversmust always replicate the entire site
while in our system a server may replicate
onlysub-trees oftheoriginalsite. Second,in
their system the replicated servers must be
placedincloseproximityoftheauthoritative
DNSserver. Ourarchitectureplacesno
con-straintsontheplacementoftheservers.
7 Future Work
WewillcontinueourworkonLDSby
per-forming quantitativecomparisons of the
dif-ferent architectures presented in this paper.
In particular we will concentrate on
evalu-ating the amount of new traÆc in the
net-work causedby thelookupand update
mes-sages. We will also compare the traditional
cachinghierarchywith ourdierentLDS
ar-chitectures to nd out how much object
ac-cesslatencycanbereducedthroughLDS.We
willalsocloselystudytherequestrouting
is-sueinordertondouthowmuchinformation
isneededto makegoodroutingdecisions.
8 Conclusion
Inthispaperwehavepresentedthe
Loca-tionDataSystem,anewnetworkapplication
that can be used to locate where copies of
objectsarestoredontheWeb. LDSprovides
a service similar to DNS and can be
imple-mentedasanextensiontoDNSanddeployed
incrementally. Wehavepresentedtwo
appli-cationsof LDS, locating objectsin mirrored
servers and locating objects in Web caches.
We have also discussed how clients can
fromthenetwork.
Acknowledgments
Wewould liketothankNeilze Dortafrom
INRIA forthediscussionswehadon
access-ingmirroredsites and herinput onthe
sub-ject. TheloglesusedinSection2.1were
pro-videdbyNationalScienceFoundation(grants
NCR-9616602 and NCR-9521745), and the
NationalLaboratoryforAppliedNetwork
Re-search.
References
[1] Y. Amir, A. Peterson, and D. Shaw,
\Seamlessly Selecting the Best Copy
from Internet-Wide Replicated Web
Servers", in Proc. 12th International
Symposium on Distributed Computing
(DISC'98), Andros, Greece, September
1998.
[2] R. Fielding, J. Gettys, J. Mogul, H.
Frystyk,andT.Berners-Lee,\Hypertext
Transfer Protocol { HTTP/1.1", RFC
2068,January1997.
[3] S.Gadde,J.Chase,andM.Rabinovich,
\DirectoryStructuresforScalable
Inter-netCaches",TechnicalReport
CS-1997-18, Department of Computer Science,
DukeUniversity,1997.
[4] C. Grimm, J.-S. Vockler, H. Pralle,
\Request Routing in Cache Meshes",
in Proc. 3rd Web Caching Workshop,
Manchester,UK, June1998.
[5] P.Mockapetris,\DomainNames{
Con-ceptsandFacilities",RFC1034,
Novem-ber1987.
[6] P. Mockapetris, \Domain Names {
Im-plementation and Specication", RFC
1035,November1987.
[7] A. Myers, P. Dinda, and H. Zhang,
port CMU-CS-98-157, School of
Com-puter Science, CarnegieMellon
Univer-sity,1997.
[8] A Distributed Testbed for
Na-tional Information Provisioning,
<URL:http://ircache.nlanr.net>.
[9] Reseau National de telecommuncations
pour la Technologie,
l'Enseignement et la recherche,
<URL:http://www.renater.fr>.
[10] P. Rodriguez, K. W. Ross,E. W.
Bier-sack,\ImprovingtheWWW:Cachingor
Multicast?", in Proc. 3rd Web Caching
Workshop,Manchester,UK,June1998.
[11] A. Rousskov, D. Wessels, \Cache
Di-gests",inProc.3rdWebCaching
Work-shop, Manchester, UK,June1998.
[12] Squid Internet Object Cache,
<URL:http://squid.nlanr.net>.
[13] R. Tewari, M. Dahlin, H. M. Vin,
and J. S. Kay, \Beyond Hierarchies:
Design Considerations for Distributed
CachingontheInternet",Technical
Re-portTR98-04,DepartmentofComputer
Sciences, UniversityofTexasatAustin.
[14] P. Vixie, S. Thomson, Y. Rekhter,and
J.Bound,\DynamicUpdatesinthe
Do-main Name System (DNS UPDATE)",
RFC2136, April1997.
[15] D. Wessels, K. Clay, \Internet Cache
Protocol (ICP), version 2", RFC 2186,