CERIAS Tech Report 2001-02
PROTOCOLS FOR SECURE REMOTE
DATABASE ACCESS WITH
APPROXIMATE MATCHING
Wenliang Du, Mikhail J. Atallah
Center for Education and Research in
Information Assurance and Security
&
1
WenliangDu
CERIASand
DepartmentofComputerSciences
PurdueUniversity
WestLafayette,IN47907
MikhailJ.Atallah
CERIASand
DepartmentofComputerSciences
PurdueUniversity
WestLafayette,IN47907
Keywords: Privacy, security, secure multi-party computation, pattern matching,
approximatepatternmatching
Abstract Suppose that Bobhasa database D andthat Alicewants toperform a
searchqueryqonD(e.g.,\isqinD?"). SinceAliceisconcernedabout
herprivacy,shedoesnotwantBobtoknowthequery qortheresponse
tothequery. How could thisbe done? There are elegantcryptographic
techniques for solving this problem under various constraints (such as
\Bob should know neither q nor the answer to the query" and \Alice
should learn nothing about D other than the answer to the query"),
whileoptimizingvarious performance criteria(e.g.,amountof
commu-nication).
We consider the version of this problem where the query is of the
type \isq approximately in D?" for a number of dierentnotions of
\approximate", some of whicharise in image processing and template
1
PortionsofthisworkweresupportedbyGrantEIA-9903545fromtheNationalScience
Foun-dation,andbysponsorsoftheCenterforEducationandResearchinInformationAssurance
matching,while othersareofthestring-edittypethatariseinbiological
sequence comparisons. Newtechniquesare needed inthis frameworkof
approximate searching, because each notion of \approximate equality"
introducesits ownset ofdiÆculties;using encryptionismore
problem-atic in this framework because the items that are approximately equal
ceasetobe soafterencryption orcryptographichashing. Practical
pro-tocolsforsolvingsuchproblemsmakepossiblenewformsofe-commerce
between proprietary database ownersandcustomers who seek toquery
thedatabase,withprivacy.
Werstpresent four secure remote database access modelsthat are
usedinthee-commerce,eachofwhichhasdierentprivacyrequirement.
Wethenpresentoursolutionsforachievingprivacyineachofthesefour
models.
1. INTRODUCTION
Consider the following real-life scenario: Alice thinks that she may
havesomegeneticdisease,soshewantstoinvestigateitfurther. Shealso
knows thatBob hasa databasecontaining known DNA patternsabout
various diseases. After Alice gets a sample of her DNA sequence, she
sendsittoBob,whowillthentellAlicethediagnosis. However,ifAliceis
concernedaboutherprivacy,theaboveprocessisnotacceptablebecause
itdoesnotprevent Bobfrom knowing Alice'sprivateinformation{both
thequeryand theresult.
Thiskindofsituation,whichislikelytoarisease-commercedevelops,
motivates thefollowinggeneral problemformulation:
SecureDatabaseAccess (SDA) Problem: Alice hasastringq, andBob
hasadatabaseofstringsT =ft1;:::;tNg;Alicewantstoknowwhether
thereexistsastringt i
inBob'sdatabasethat\matches"q. The\match"
couldbeanexactmatchoranapproximate(closest)match.Theproblem
ishowtodesignaprotocolthataccomplishesthistaskwithoutrevealing
toBobAlice'ssecretquery qortheresponsetothatquery.
Becauseof its practicalimportanceand also because notmuch work
has been done for approximate pattern matching in the SDA context,
ourwork particularlyfocuseson approximatepattern matching.
The exact matching problem has beenextensively considered in the
literature[19, 4, 16,15,20,22, 21,11], even though itcan theoretically
be solved using the general techniques of secure multi-party
computa-tion[8]. Themotivationforgivingthesespecializedsolutionstoitisthat
theyaremore eÆcientthanthose thatfollowfromtheabove-mentioned
general techniques. This is also our motivation inconsidering
approxi-matepatternmatchingeventhoughittooisaspecialcaseofthegeneral
measures the dierence between the two targets, and produces a score
to indicatehow dierent thetwo targets are. Themetrics usedto
mea-sure thedierence usuallyare heuristic and are application-dependent.
For example, in image template matching [12, 17], P n i=1 (a i b i ) 2 and P n i=1 ja i b i
jare often used to measure the dierencebetween two
se-quences a and b. In DNA sequence matching [13], edit distance [1, 5]
makesmore sensethantheabovemeasurements; edit distancemeasures
thecost of transformingone given sequence to another given sequence,
anditsspecialcase, longestcommon subsequenceisusedtomeasurehow
similartwo sequences are.
SolvingapproximatepatternmatchingproblemswithintheSDA
frame-work is quite a nontrivial task. Consider the P n i=1 ja i b i j metric as
an example. The known PIR (privateinformationretrieval) techniques
[19, 4, 16, 15, 20, 22, 21, 11] can be used by Alice to eÆciently access
eachindividualb i
withoutrevealingtoBobanythingaboutwhichb i
(or
evenwhichb)Aliceaccessed(moreonthislater),butdoingthisforeach
individualb i
andthencalculating P n i=1 ja i b i
jviolatestherequirement
that Alice should know the total score P n i=1 ja i b i j without knowing
anythingother thanthat score, i.e.,withoutlearninganythingaboutthe
individual b i
values. Using a general secure multi-party computation
protocol typically does not lead to an eÆcient solution. The goals of
our research, and the results presented in this paper, are nding
eÆ-cientwaystodosuchapproximatepatternmatchingswithoutdisclosing
privateinformation.
The practical motivations of remotedatabase accessdo not allpoint
to the model we described in the above SDA formulation. For
exam-ple,in some situations,Bob'sdatabasecould beproprietarywhereasin
someothersitcould bepublic(ineithercasetheprotocolshouldreveal
nothing to Bob about Alice's query). The \proprietary" nature of a
database might make the solution more diÆcult because Alice should
notbe able to know more informationthan the response to her query.
Thereisalso anotherpracticalframework,withinwhichAliceuses Bob
to store a (suitably disguised) version of her private database (a form
of outsourcing), and for such a framework the solutions could be quite
dierent. Based on these variantsof the problem,we have investigated
fourSDA models, and dened a class of SDA problems foreach model
according to the metrics we usefor approximate pattern matching. Of
coursethediÆcultiesof theproblemsarenotthesame forthe dierent
metrics,and so farwe have solved a subset of those problems. A
sum-maryofourresultsislistedbelow(the resultsare statedmore precisely
For thePrivate Information Matching Model, we have a solution
tothe approximatepatternmatchingbased onthe P n i=1 (a i b i ) 2
metricwith O(nN) communication cost,where n is the length
ofeach string andN isthenumberof stringsin thedatabase.
ForthePrivateInformationMatchingModel,Wealsohavea
solu-tiontotheapproximatepatternmatchingbasedonthe P n i=1 ja i b i
j metric using a Monte Carlo technique; the solution gives an
estimated result, and it has O(nW N) communication cost,
whereW isaparameter thataectstheaccuracy oftheestimate.
ForthePrivateInformationMatchingModel,ifweassumethatthe
alphabetis known to theinvolved parties and its sizeis nite,we
have asolutiontoapproximatepatternmatchingbasedongeneral P n i=1 f(a i ;b i
) metrics,hence the solutions forthe special cases of P n i=1 ja i b i j, P n i=1 (a i b i ) 2 ,and P n i=1 Æ(a i ;b i ) (where Æ(x;y) is
1 if x =y and 0 otherwise). These solutions have O(mnN)
communicationcost,wheremis thenumberof thesymbolsinthe
alphabet. In many cases, m is small. For instance, m is four in
DNAdatabases.
FortheSecureStorage OutsourcingModel,wehaveapractical
so-lutiontoapproximatepatternmatchingbasedonthe P n i=1 (a i b i ) 2
metric. The solutionispractical because its O(n)communication
costdoesnotdependon N.
Forthe Secure Storage Outsourcingand Computation Model,we
also have a practical solution to approximate pattern matching
based on the P n i=1 (a i b i ) 2
metric. This solution is practical
because of its communicationcostis O(n 2
).
Motivation
Why do we care about the privacy of a database query? In the
ex-ample used earlier inthis section, if a match is found inthe database,
Bobimmediatelyknows thatAlicehassucha disease;even worse, after
receivingAlice's DNAsequence, Bobcan derive muchelseaboutAlice,
suchasotherhealthproblemsthatAlicemighthave. IfBobisnot
trust-worthy,BobcoulddisclosetheinformationaboutAliceto otherparties,
and Alice might have diÆculty getting employment, insurance, credit,
etc. Buteven ifAlicetrustsBob,andBobhasnointentionofdisclosing
Alice'sprivateinformation,BobhimselfmightpreferthatAlice's query
in-disgruntled employee of Bob's, or after a system break-in), Bob might
facean expensive lawsuitfrom Alice. From this perspective, a trusted
BobwillactuallyprefernottoknoweitherAlice'squeryorits response.
With the growth of the Internet, more and more e-commerce
trans-actions like theabove will take place. There are already DNA pattern
databases, public databases about diseases, patent databases, and in
the future we may see many more commercial databases and the
re-lated database access services, such as ngerprint databases, signature
databases,medicalrecorddatabases,and manymore. Privacywillbe a
major issue,and assumingthe trustworthinessof the serviceproviders,
as is done today, is risky; therefore protocols that can support remote
access operations while protecting the client's privacy are of growing
importance.
One of the fundamental operations behind the queries described in
theexamples above is pattern matching. Therefore, the basic problem
thatwefaceishowtoconductpatternmatchingoperationsattheserver
sidewhile the server has no knowledge of the client's actual query (or
the response to it). In some database access situations, exact pattern
matching isused,suchasquerybyname,querybysocialsecurity
num-ber, etc. However, in many other situations, exact pattern matching
is unrealistic. For instance, inngerprint matching, even if two
nger-prints come from the same nger, they are unlikely to be exactly the
same because there is some informationloss in the process of deriving
an electronic form (usually a complex data structure of features) from
a raw ngerprint image. Similarly in voice, face, and DNA matching;
inthese and manyothersituations,exact matchingis notexpectedand
some formof approximatepattern matching is moreuseful.
BackgroundInformationonSecureMulti-party
Com-putation
The above problemisaspecialcase ofthegeneralsecuremulti-party
computation problem [28]. Generally speaking, a multi-party
compu-tationproblemdeals withcomputingany probabilisticfunctionon any
input,ina distributednetwork where each participant holdsone of the
inputs,ensuring independence of the inputs,correctness of the
compu-tation,and thatno more informationis revealedto a participant inthe
computation than can be computed from that participant's inputand
output [10]. Other examples of such computations include: elections
over theInternet, electronic bidding,jointsignatures, andjoint
andWigderson[23],andbymanyothers: GoldWasser[10]predictsthat
\the eld of multi-party computations is today where public-key
cryp-tography was ten years ago, namely an extremely powerful tool and
rich theory whose real-life usage isat thistimeonly beginning butwill
become inthefuture an integral partof ourcomputingreality".
Goldreichstatesin[8]thatthegeneralsecuremulti-partycomputation
problemissolvableintheory. However,healsopointsoutthatusingthe
solutionsderivedbythesegeneralresultsforspecialcasesofmulti-party
computation,can be impractical;specialsolutions shouldbedeveloped
forspecialcases foreÆciency reasons.
One of the well-known special cases of multi-party computation is
thePrivateInformationRetrieval(PIR)problem: The problemconsists
of a client and server. The client needs to get the ith bit of a binary
sequence from the server without letting the server know the i; the
server does notwant the client to know the binary sequence either. A
solutionforthisproblemisnotdiÆcult;howeveraneÆcientsolution,in
particularasolutionwithsmallcommunicationcost,isnoteasy. Studies
[19,4, 16,15,20,22, 21,11]have shownthat one candesigna protocol
to solve the PIR problemwith muchbetter communicationcomplexity
than by using the general theoretical solutions. Pattern matching is
anothersuch specic computation, and the recent progress in the PIR
problemmotivatedusto speculatethatthereexist eÆcientsolutionsfor
thisparticularkindof securemulti-partycomputationaswell.
Secure Multi-party Protocol vs. Anonymous
Com-munication Protocol
Anonymouscommunicationprotocols [24,9]weredesignedtoachieve
somewhat related goals, so why not use them? Anonymity techniques
help to hide the identity of theinformationsender, ratherthan the
in-formationbeing sent. Forexample, when people browse the web, they
can use anonymous communication techniques to keep their identities
secret, but the web query usually is not secret because the web server
hastoknowthequeryinordertosendareplyback. Insituationswhere
theidentityoftheinformationsenderneedsto beprotected,anonymous
communicationprotocolsareappropriate. However, therearesituations
whereanonymouscommunicationprotocolscannotreplacesecure
multi-partycomputation protocols. First,certain typesof information
intrin-sically reveal the identity of someone related to the information (e.g.,
socialsecuritynumber). Secondly,insome situations,it isthe
an invention is new before she lesfora patent. When conducting the
query,Alicemaywant to keepthequeryprivate(perhapsto avoid part
ofherideabeingstolenbypeoplewhohaveaccesstoherquery);shedoes
notcarewhether heridentityis revealed. Thirdly,incertain situations,
one has to be a registered member in order to use the database access
service;thismakeshidingauser'sidentitydiÆcultbecausetheuserhas
to registerand loginrst,whichmight alreadydisclose heridentity.
Furthermore,mostoftheknownpracticalanonymousprotocols,such
asCrowds[24], Onionrouting[9] and anonymizer.com,useone or
sev-eraltrusted thirdparties. Inour securemulti-partycomputation
proto-cols,wedo notuseatrustedthirdparty; whenathirdpartyisused,we
generally assume that the third party is not trusted, and should learn
nothingabouteitherAlice'squery,orBob'sdata,ortheresponsetothe
query.
Thereforeanonymitydoesnottotallysolveourproblems,andcannot
replacesecuremulti-partycomputation. Rather,bycombininganonymity
techniques with secure multi-party computation techniques, one can
achievebetter overall privacymore eÆciently.
2. RELATED WORK
As Goldwasser points outin [10], in the 1980's the focus of research
was to show the most generalresult possible, yielding multi-party
pro-tocolsolutionsforanyprobabilisticfunction. Much ofthecurrentwork
is to focus on eÆcient and non-interactive solutions to special
impor-tantproblemssuch asjoint-signatures,joint-decryption,andsecureand
privatedatabaseaccess.
Among variousmulti-partycomputation problems,thePrivate
Infor-mationRetrieval (PIR) problemhasbeen widelystudied; it is also the
problemmost related to what we present in thispaper (although here
weusenoneoftheeleganttechniquesforPIRthatarefoundinthe
liter-ature,forreasonswe explainedearlierinthispaper). ThePIR problem
consists of devising a protocol involving a user and a database server,
eachhavingasecretinput. Thedatabase'ssecretinputiscalledthedata
string,an N-bitstringB =b 1
b 2
:::b N
. Theuser's secretinputisan
in-tegeribetween1andn. Theprotocolshouldenabletheusertolearnb i
ina communication-eÆcient way and at the same time hidei from the
database. The trivial solution is having the database send an
encryp-tionoftheentirestringB to theuser,withanO(nN)communication
complexity. Much work hasbeendoneforreducingthiscommunication
Choretal. pointoutthatamajordrawbackofallknownPIRschemes
istheassumptionthattheuserknowsthephysical addressofthesought
item [7],whereasinthe current databasequery scenario theuser
typi-callyholdsakeywordandthedatabaseinternallyconvertsthiskeyword
into a physical address. To solve this problem, Chor et al. propose a
schemetoprivatelyaccessdatabykeywords [7]. Thedierencebetween
the problemstudied in Chor's paper and the problems in our paper is
thatwe extendthe problemto cover approximatepatternmatching.
Songet al. proposea scheme to conduct searches on encrypteddata
[27]. In that framework, Alice has a database, and she has to store
the database in a server controlled byBob; how could Alice query her
databasewithoutletting Bob know thecontents of thedatabase orthe
query? Here we primarilyfocuson extendingtheproblemto also cover
approximatepatternmatching.
3. FRAMEWORK
3.1. MODELS
Remotedatabaseaccesshasmanyvariants. Insomee-commerce
mod-els,Bob'sdatabaseisprivatewhileinsomeothermodels,itispublic. In
thelattercase, thereisno requirementtokeepthedatabasesecret from
Alice;however, the privacyof Alice's query stillneeds to be preserved.
In other e-commerce models, Bob hosts Alice's (encrypted/disguised)
database while supporting queries from Alice and other customers, in
whichcase Bobshouldknowneither thedatabasenorthequeries.
Private Database
Bob’s
Bob
Alice
query
reply
Public Database
Alice
Bob
query
reply
(c) SSO Model
Private Database
Alice’s
Bob
Alice
query
reply
(d) SSCO Model
Private Database
Alice’s
Bob
Alice
Carl
outsourcing
query
reply
pay for
the service
From thevariousways that remotedatabase accessis conducted, we
distinguish four dierent e-commerce models, all of which require
cus-tomers'privacy:
PIM:PrivateInformationMatchingModel(Figure1.a)
PIMPD:PrivateInformationMatchingfromPublicDatabaseModel
(Figure1.b).
SSO:Secure Storage OutsourcingModel(Figure1.c).
SSCO:SecureStorage andComputingOutsourcingModel(Figure
1.d).
For the sake of convenience, we will use Match() to represent the
patternmatching function,whichincludesbothexact patternmatching
andapproximatepatternmatching.
PrivateInformationMatchingProblem(PIM) Alicehasastring
x, and Bob has a database of strings T = ft 1
;:::;t N
g; Alice wants to
know the result of Match(x;T). Because of the privacyconcern, Alice
does not want Bob to know the query x or the response to the query;
Bobdoesnot want Aliceto know any stringin thedatabase exceptfor
what can be derived from the reply. Furthermore, Bob wants to make
moneyfrom providingsuchaservice, thereforeAliceshouldnotbe able
to conduct the querying by herself; in other words, every time Alice
wants to perform such a query, she has to contact Bob, otherwise she
cannotgetthe correctanswer.
Private Information Matchingfrom Public Database Problem
(PIMPD) Bob has a database of strings T = ft
1 ;:::;t
N
g, whose
contents are public knowledge. Alice has a query x, and she wants to
know the result of Match(x;T) without disclosing to Bob either her
queryx ortheresponseto it.
ThisproblemisdierentfromthePIMproblem: inthePIMproblem,
Bob does not allow Alice to know any information about the database
exceptforwhatcan bederived fromthereply. InthePIMPDproblem,
sincethe database contains only public knowledge, there is no need to
prevent Bob from letting Alice know more about the contents of the
databasethan the strict answerto her query (although Bob's doing so
mayresult inunnecessarycommunication).
SecureStorageOutsourcingProblem(SSO) Alicehasadatabase
thelarge database, soshe outsourcesher database (suitablydisguised{
moreon thislater) toBob, whoprovidesenoughstorageforAlice.
Fur-thermore, from time to time, Alice needs to query her database and
retrieves the information that matches her query, i.e., Alice wants to
know Match(x;T) for her query x. As usual, Alice wants to keep the
contentsof boththedatabaseand thequerysecret from Bob.
Secure Storage and Computing Outsourcing Problem (SSCO)
TheSSCO problemis an extensionof the SSOproblem. Whereas only
Alicequeriesherdatabasein theSSOproblem,intheSSCOmodelthe
databasewillalsobequeriedbyotherclientsofAlice. Morespecically,
intheSSCOmodel,AliceoutsourcesherdatabasetoBob,andshewants
thedatabaseto beavailableto anyone who iswillingto payherforthe
database access service. When a client accesses the database, neither
Alice norBob shouldknow thecontents of the query. Moreover, Alice
wants to charge the clientsfor each query they have submitted, so the
client should not be able to get the correct query result if Alice is not
awareof thequery'sexistence.
SinceBobcan pretendtobeaclient,thesolutionsoftheSSCO
prob-lem should be secure even if Bob can collude against Alice with any
client. However, theSSOproblemdoesnothavesuchaconcernbecause
theonlyclient isAlice herself.
3.2. NOTATION
Foreach model,thereisafamilyof problems. We willusethe
follow-ingnotations to representseach specic problem:
M/Exact: Exact Pattern Matchingproblem inmodel M.
M/Approx: ApproximatePatternMatchingprobleminmodelM.
{ M/Approx/f: use P n k=1 f(a k ;b k
) metricto measure the
dis-tance betweentwostrings, wheref is ageneral function.
{ M/Approx/Æ: use P n k=1 Æ(a k ;b k
) metric to measure the
dis-tance between two strings,whereÆ isthe Kronecker symbol:
Æ(x;y) =0 ifandonlyifx=y and1 otherwise.
{ M/Approx/Abs: use P n k=1 ja k b k
j metric to measure the
distance betweentwo strings.
{ M/Approx/Squ: use P n k=1 (a k b k ) 2
metric to measure the
M/Approx/Edit/String: usethe string editing criterion
[5] to measure thedistancebetween two strings.
M/Approx/Edit/Tree: use the tree editing criterion to
measure thedistance betweentwo trees.
TheM/Exactproblemhasbeenstudiedextensivelyincertainmodel,
suchasPIMandSSO,buttheM/Approxproblemhasnot. Ourresults
dealmostlywith theM/Approxproblem.
4. OUR RESULTS
4.1. PRELIMINARY
Protocol for scalar (and other) products Recall that the scalar
productof two vectors~x=(x 1 ;:::;x n ) and ~y=(y 1 ;:::;y n ) is: ~ x~y = P n k=1 x k y k .
WedescribeaprotocolforAliceandBobtocomputethescalar
prod-uctofAlice'svector~xandBob'svector~yusinganuntrustedthirdparty
Ursula. Neither Alice nor Bob should learn anything about the other
party'sinput(otherthanwhat can be derivedfrom knowing~x~y),and
Ursulashouldlearnnothingabouteither~xor~y. Thisprotocolwilllater
serve asa buildingblock forother protocols. Essentially thesame
pro-tocol also solves theasymmetric version of the problem, in which only
Aliceisto know~x~y.
1 Alice and Bobjointlygenerate two randomnumbersr and r
0 .
2 Aliceand Bob jointlygenerate two randomvectors ~ R , ~ R 0 (of size n). 3 Alicesendsw~ 1 =~x+ ~ R and s 1 =~x ~ R 0 +r to Ursula. 4 Bobsendsw~ 2 =~y+ ~ R 0 and s 2 = ~ R(~y+ ~ R 0 )+r 0 to Ursula. 5 Ursulacomputesv=w~ 1 w~ 2 s 1 s 2 andgetsv=~x~y (r+r 0 );she
then send the result to Alice and Bob. Note. In the asymmetric
versionoftheproblem(inwhichonlyAliceistoknow~x~y)Ursula
doesnot sendanythingto Bobinthisstep.
6 Alice and(in thesymmetric versionof theproblem)Bobthenget
~
x~y=v+(r+r 0
).
Note thatthecommunicationcomplexityoftheaboveislinearinthe
sizeoftheinputs,i.e., itis O(n).
Although onlythe scalar product case is needed later in this paper,
car-includematrixproduct, inwhichcase AliceandBobhave matricesand
R ;R 0
;r;r 0
arerandommatrices. Theyalsoincludetheconvolution
prod-uct of two vectors, in which case R ;R 0
;r;r 0
are random vectors. This
couldpotentiallybe usefulinother contexts.
4.2. PIM/APPROX
Except for the research on the general secure multi-party
computa-tionproblem,thisspecicproblemhasnotbeenstudiedintheliterature.
Unless otherwisespecied, we assume the alphabet used in the
follow-ingsolution to be predenedand its sizeto be nite. This assumption
is quite reasonable in many situations; for instance, DNA sequences
use a xed alphabet of four symbols. Under this assumption, we can
solve thePIM/Approx/f problem. However, because theway to
calcu-late edit distance cannotbe represented intheform P n k=1 f(a k ;b k ),the
PIM/Approx/Edit problemis not a special case of thePIM/Approx/f
problem. We also have a solution for PIM/Approx/Edit/String
prob-lem, but because of its complexity and space limitation, we will leave
thesolutionto thejournalversionof thispaper.
In some other situations, the above nitealphabet assumption does
not apply. For instance, ngerprint, image and voice patterns use real
numbersinsteadofcharactersfromaknownnitealphabet. The
above-mentionedsolutionforthePIM/Approx/f problemcannotbeused
any-more,howeverbyexploitingthemathematicalpropertyof P n i=1 (a i b i ) 2 ,
we have come upwith asolutionforthePIM/Approx/Squproblem for
innitealphabetafterintroducinganuntrustedthirdpartywhodoesnot
knowtheinputsfromeitherofthetwopartiesandlearnsnothingabout
them(or aboutthe query,ortheanswerto it). We also have asolution
to the PIM/Approx/Abs problem using a Monte Carlo technique. All
ofthese aregiven below.
4.2.1 PIM/Approx/Squ Protocol. Suppose that Bob has
a database T =ft
1 ;:::;t
N
g, and assume the lengthof each string t i
is
n; Alice wants to know the t i
2 T that most closely matches a query
x=x
1 :::x
n
based on thePIM/Approx/Squ metric. Therequirement is
thatBobshouldnotknow xortheresult, andAlice shouldnotbe able
to learnmore informationthanthereply from Bob.
We propose a protocol to compute the matching score usingan
un-trusted third party, Ursula. Our assumption here is that Ursula will
notconspire with either Alice orBob. However, the third party is not
fullytrusted: Ursulashouldnotbe ableto deduce eitherx orT,orthe
Let~x=( 2x 1 ;:::; 2x n ;1);foreacht i =y i;1 :::y i;n ,let~z i =(y i;1 ;:::;y i;n ; P n k=1 y 2 i;k ),Observe that: n X k=1 (x k y i;k ) 2 =~x~z i + n X k=1 x 2 k : Since P n k=1 x 2 k
isaconstant,wecanuse~x~z i insteadof P n k=1 (x k y i;k ) 2
to nd the closest match. After we get the closest match, Alice can
calculatetheactualscore byadding P n k=1 x 2 k . Protocol
1 Aliceand Bobjointlygenerate two randomnumbersr and r
0 .
2 For each t i
2 T, repeat the next ve sub-steps, in which t i = y i;1 :::y i;n ,~x=( 2x 1 ;:::; 2x n ;1).
(a) Bobconstructs~z i =(y i;1 ;:::;y i;n ; P n k=1 y 2 i;k ),
(b) AliceandBob jointlygeneratetworandomvectors ~ R , ~ R 0 (of sizen+1). (c) Alicesendsw~ 1 =~x+ ~ R and s 1 =~x ~ R 0 +r to Ursula. (d) Bobsendsw~ 2 =~z i + ~ R 0 and s 2 = ~ R(~z i + ~ R 0 )+r 0 to Ursula.
(e) Ursulacomputesv i =w~ 1 w~ 2 s 1 s 2
and getstheresulting
v i =~x~z i (r+r 0 ).
3 Ursulacomputesscore
0 =min N i=1 v i
,andsendstheresultingscore 0
toAlice.
4 Alicecomputes score=score
0 + P n k=1 x 2 k +(r+r 0 ), which is the
closest match betweenx and anyt i
2T.
The random vectors
~
R and
~ R 0
areused to disguiseAlice's and Bob's
data;therandomnumbersrandr 0
areusedtodisguisethequeryresults
and theintermediate results. The communicationcost isO(nN).
4.2.2 PIM/Approx/Abs Protocol. First, we willpresent a
Monte Carlo technique for Alice and Bob to calculate jx k y k j (x k is
Alice'ssecret inputandy k
isBob's),andthenuseitasabuildingblock
to compute P n k=1 jx k y k
j. The protocol involves an untrusted third
party, Ursula, who learns nothing. The protocol works for both nite
and innite alphabets. Assume that 0 <x k
U and 0 < y k
U for
a parameterthat aectsthe accuracy of theestimate, and counter =0
initially):
1 Alice generates a random number R
k
, and then generates a
se-quence of W R
k
random i.i.d. numbers, each uniformly over
(0::U].
2 Alicerandomlyreplaceshalfof these W R
k
numbers withtheir
negative values.
3 Alice \splices" R k
zeroes into random positions of the above
se-quenceof W R
k
numbers, resulting in a new sequence S of W
numbers.
4 AlicethensendsS to Bob.
5 ForeachnumbersfromS,ifs=0,Alicesends1toUrsula;ifs>0
thenAlice sends 1 to Ursula ifjsj x k
and sends 0 otherwise; if
s<0thenAlicesends0toUrsulaifjsjx k
andsends1otherwise.
6 ForeachnumbersfromS,ifs=0,Bobsends0toUrsula;ifs>0
then Bob sends 1 to Ursula if jsj y k
and sends 0 otherwise; if
s<0thenBobsends0to Ursulaifjsjy k
andsends1otherwise.
7 Ursulaincreasescounter by1ifthevaluesshe receivesfrom Alice
andBob aredierent.
8 Ursulacomputes score=counter
U W
,which is an unbiased
esti-mateof jx k y k j+R k U W . Becauseof R k
,Ursuladoesnotknow theactualdistance betweenx k
and y
k
, and because of the negative numbers among those W random
numbers,Ursula cannotgure outwhether x
k >y k orx k <y k .
Now, letusseehowtousetheabove protocolto compute P n k=1 jx k y i;k j, wherex=x 1 :::x n andt i =y i;1 :::y i;n :
1 Alicegenerates a randomnumberR .
2 Foreach t i 2T,supposet i =y i;1 :::y i;n
and repeatthe next three
sub-steps:
(a) counter=0.
(b) Foreachk=1;:::;n,Alice,BobandUrsulausetheabove
pro-tocoltocomputejx k
y i;k
j. TherandomnumbersR
i;1 ;:::;R
i;n
usedinthe above protocol aregenerated by Alice, such that P
n R
(c) Ursula computesscore i =counter U W ,whichisanunbiased estimateof P n k=1 jx k y i;k j+ P n k=1 R i;k U W = P n k=1 jx k y i;k j +R U W .
3 Ursulacomputesscore
0 =min N i=1 score i
,andsendsscore 0
toAlice.
4 Alicecomputes score=score 0
R U W
andgetstheclosest match
betweenxand anyt i
2T.
The communication complexity is O(nW N). The analysis will
giveninthe fullversion ofthispaper.
4.2.3 PIM/Approx/f protocol. If the alphabet is
prede-ned and its size is nite, we can solve a general problem{computing
f(x k
;y k
). However,wecannotdirectlyusethisprotocolntimesto
com-pute P n k=1 f(x k ;y k
)becausethatwouldreveal each individualf(x k
;y k
)
result. We will present the protocol for computingf(x k
;y k
) here, and
theninthefollowingsub-section,wewilldiscusshowtouseitasa
build-ing block to compute
P n k=1 f(x k ;y k
) without revealing any individual
f(x k
;y k
).
SupposeAlice has an inputx k
; Bob hasan inputy k
;Alice wants to
knowtheresultof f(x k
;y k
)withoutrevealingx k
andtheresult toBob,
and Bob does not want to reveal his y
k
to Alice. After presenting a
solutionto thisproblem,welateruseitasa buildingblocktoconstruct
solutionsto other problems.
f-functionProtocol Weassumetheencryptionmethodsusedbelow
arecommutative. 1 Bob computes f( i ;y k ) for each i
2 X, where X is the nite
(known) alphabet. Letm be thesizeofX.
2 Bobchooses a secret key k,computes E
k (f( i ;y k )) foreach i 2
X,and sendsto Alicethe mresults.
3 Alicechooses one from E k (f( i ;y k )), i=1:::m, such that i = x k
. Thiscan bedonebecauseBobsentthemencryptedresultsin
order.
4 Alice chooses a secret key k 0 , computes E k 0 (E k (f(x k ;y k ))), and
sendsit backto Bob.
5 BecauseofthecommutativepropertiesofE k 0 andE k ,E k 0(E k (f(x k ; y )))isequivalenttoE (E 0
to E k 0(f (x k ;y k
)) by Bob. Bob sends the result E
k 0(f (x k ;y k )) to Alice. 6 Alicegets f(x k ;y k ) bydecrypting E k 0(f(x k ;y k )).
Thetechniqueusedaboveissimilartothestandardoblivioustransfer
protocol; itprotectstheprivacyoftheinputsfrombothpartieswithout
introducingathird-party. ThecommunicationcostisO(m),wherem is
thesizeofthe alphabet.
PIM/Approx/f Protocol Now, letusseehowto securelycompute
min N i=1 ( P n k=1 f(x k ;y i;k
)). As we discussed above, we cannot run the
above f-function protocol n times to get P n k=1 f(x k ;y i;k ). In the
fol-lowingprotocol, we willuseadisguisetechniqueto hideeachindividual
resultof f(x k ;y i;k ). For each t i = y i;1 :::y i;n
, and for each k = 1;:::;n, let f i;k (x k ;y i;k ) = f(x k ;y i;k )+R i;k ,whereR i;k
isarandomnumber,thefollowingprotocol
shows how Aand Bcalculatemin
N i=1 P n k=1 f(x k ;y i;k ),
1 Bobgenerates arandom numberR thensendsR to Alice.
2 Foreach t i =y i;1 ;:::;y i;n
,repeat thenext vesub-steps:
(a) Bobconstructsf i;k (x k ;y i;k )=f(x k ;y i;k )+R i;k fork =1;:::;n, whereR i;1 ;:::;R i;n
arenrandomnumbers.
(b) AliceandBobusethef-functionprotocoltocomputef i;k (x k ; y i;k ),foreach k =1;:::;n. (c) Alicesends P n k=1 f i;k (x k ;y i;k ) to Ursula. (d) Bobsends P n k=1 R i;k R to Ursula.
(e) Ursula computes score
i = P n k=1 f i;k (x k ;y i;k ) ( P n k=1 R i;k R )= P n k=1 f(x k ;y i;k )+R .
3 Ursulacomputesscore
0 =min N i=1 score i
,andsendsscore 0
toAlice.
4 Alicecomputescore=score
0
R ,thusgettingtheactualdistance
betweenxand the closest t i
inthedatabase T.
AlthoughAliceknowseachindividualf i;k
(x k
;y i;k
),shedoesnotknow
the actual value of f(x k ;y i;k ) because of R i;k . Similarly, because of
R , Ursula does not know the actual score of the closest match. The
communicationcost oftheprotocol isO(mnN),wheremisthesize
ofthealphabet,nisthelengthof eachpattern, andN isthesizeofthe
Becausejx k y k j,(x k y k ) 2 andÆ(x k ;y k
)functionsarespecialcasesof
f(x k
;y k
), PIM/Approx/(Abs, Squ, Æ) problemscan all be solved using
theabove protocol.
4.3. PIMPD/APPROX
TheonlydierencebetweenthePIMmodelandthePIMPDmodelis
that, inthelatter, Bobdoes notneedto keepthedatabase secret from
Alice. Therefore, Allsolutions inthe PIM modelcan be applied to the
PIMPD model as well. Whether the \public" feature of the database
canresultinmoreeÆcientsolutionsisaninterestingquestion. Although
we donotyethave ananswerto it, weobserved thefollowing:
Observation 4.1. There is no secure two-party non-interactive
solu-tionfor the PIMPD/Approxproblem.
Proof. A two-party non-interactive protocol means Bob, by himself, is
ableto ndtheiteminthedatabasethathasminimaldistancefromthe
query.
Assume there is a two-party non-interactive protocol A which solves
any of the PIMPD/Approx problems, in another words, given an
en-crypted/disguisedform (q 0
) of a query q,and the databaseT that Bob
knows,Bobcanndtheiteminthedatabasethathasminimaldistance
from q asfollows. We useA(T;q 0
) to represent the algorithm on input
T and q
0 .
Since Bob canuse anydatabase hewants,he can usea databaselike
this: T 0
= f\axxxxxx",\bxxxxxx", ..., \zxxxxxx"g,supposingthatthe
alphabetisasetfrom'a'to'z'. AfterapplyingA(T 0
;q 0
),Bobwillgetone
thathastheminimaldistancefromq. Forinstance,if\mxxxxxx"isthe
result,Bobknows that'm' isthe rst character in q. Since Ais a
non-interactive protocol, Bob can reuseit on another database constructed
forthepurposeofexposingthesecondcharacterinq;he cankeepdoing
thisandgure outtherest of thecharacters inq.
Therefore, if such a protocol existed, the query q would not be kept
secretfrom Bob.
The above observation does notruleout the existence of an eÆcient
interactive protocolora multi-partyprotocol.
4.4. SSO/APPROX
In this model, Bob is a service provider who provides storage and
require-Alice,norshouldhe knowthequery. SoBobhasto conductadisguised
databasequerybased on theencryptedordisguiseddataof Alice.
The requirement that Bob should not know the query result, as in
thePIMand PIMPDproblem,isno longerneeded intheSSOproblem.
Thereason isthat Bob does notknow thecontents of thedatabase, he
does noteven know what the databaseis for, so thatknowingwhether
Alice'squery isinthedatabasedoesnotdiscloseanysecret information
to Bob.
Intuitively, it can look like that the SSO/Approxproblem might be
more diÆcultthanthePIM/Approxproblembecause inthelatter Bob
at least knows the contents of the database whereas in the former he
knows nothing about the database. But knowing the contents of the
databasehasa disadvantage, inthatBob cannot knowan intermediate
resultbecauseheknowsoneoftheinputs(thedatabase);ifhealsoknew
an intermediate result, he might be able to gure out the other input
(thequery)of thecomputation. However, intheSSO/Approxproblem,
Bob knows nothing about the database, so it is safe for him to know
intermediate resultswithoutexposingthesecret query.
Whether Bobcan know intermediate resultsis a critical issuefor
re-ducingthe communicationcomplexity. If he knewintermediate results
to some extent, he could conduct the comparison operationto ndthe
minimalormaximalscore; otherwise,he hastoturnto Aliceinorderto
ndtheminimalormaximalscore,whichresultsinhighcommunication
costinthe PIMproblem.
TheSSO/Approxproblemissimilarto thesecureoutsourcingof
sci-entic computations problemsstudied by Atallah et al. [3]. The
dier-enceis thatinsecure outsourcingproblems, theinputs areprovidedby
Alice every time a computation is conducted at Bob's side; therefore,
Alice can encrypt/disguise the inputs dierently in dierent rounds of
thecomputation. However, inthe SSOproblem, one of theinputs (the
database)isencrypted/disguisedonlyonce, andthissameinputisused
inallroundsof computations;thismakesthe problemmore diÆcult.
So far, we have a solution only for SSO/Approx/Squ problem. The
solutionworksforbothinniteand nitealphabets.
4.4.1 SSO/Approx/SquProtocol. SupposethatAlicewants
tooutsource herdatabaseT =ft 1
;:::;t N
gto Bob,and wantsto knowif
querystring x=x 1
:::x n
matchesanypatternt i
inthedatabase T.
The straightforward solution would be to let Bob send the whole
database back to Alice, and let Alice conduct the query by herself.
tion would be whether Bob can conduct the matching independently
after Alice sends him the relevant information about the query. If the
answer is true, Bob should be able to nd the item t
i
that has the
closest match to the query x. In another words, if t
i = y 1 :::y n and score i = P n k=1 (x k y k ) 2
,thenBobshouldbeableto ndtheminimum
value of score i
. However, because of the privacy requirement, Bob is
notallowed to know the actualquery x, nor is he allowed to know the
content of the database, so how does he compute the distance score
i
betweenx andeach of theelementt i
inthedatabase?
Theideabehindoursolutionisbasedonthefactthat~x~z T =(~x Q 1 ) (Q~z T
), whereQ isan invertiblematrix. Alice can storeQ~z T
instead of
~z T
at Bob's site, and keeps Q secret from Bob. Shewill send ~xQ 1
to
Bobeachtime shewantsto senda queryx;therefore Bobcan compute
~x~z T
withouteven knowing~x and ~z. Ifwe can use ~x~z T to represent the P n k=1 (x k y k ) 2
, we can make it possible for Bob to conduct the
approximatepatternmatching.
For each t i = y i;1 :::y i;n
in the databaseT, let ~ t i = ( P n k=1 y 2 i;k +R R i ;y i;1 ;:::;y i;n ;1;R i ), and let ~x = (1; 2x 1 ;:::; 2x n ;R A ;1), where R , R A and R i
are random numbers. We will have ~x
~ t T i = P n k=1 y 2 i;k 2 P n k=1 x k y i;k +R+R A
, and therefore score i = P n k=1 (x k y i;k ) 2 = ~x ~ t T i +( P n k=1 x 2 k R R A ). Since ( P n k=1 x 2 k R R A ) isa constant,
itdoesnotaectthe nalresultifwe onlywant tondthet i
that
pro-duces the minimum score
i
. Therefore, Bob can use ~x ~ t T i
to compute
theclosest match.
Before outsourcing the database to Bob, Alice randomly chooses a
secret (n+3)(n+3) invertible matrix Q, and computes ~z i = Q ~ t T i , thensendsT 0 =f~z 1 ;:::;~z N g to Bob. Protocol
1 Foranyquerystringx=x 1
:::x n
,Alicegeneratesarandomnumber
R A
, and constructs a vector ~x = (1; 2x 1 ;:::; 2x n ;R A ;1), then sends~xQ 1 to Bob.
2 Bobcomputes score
0 i =~x~z T i ,fori=1;:::;N.
3 Bobcomputes min
N i=1
score 0 i
,and gets thecorrespondingi.
4 Bob returns~z i to Alice. 5 Alice computes Q 1 ~ z i and gets t i
, which is the closest match of
her query.
communica-Notice that we have introduced random numbers R , R A
, R i
for i=
1;:::;N. The purpose of R is to prevent Bob from knowing the actual
distancebetweenxandtheitemsinthedatabase;thepurposeofR A
isto
preventBobfromknowingtherelationshipbetweentwodierentqueries;
thepurposeofR i
istopreventBobfromknowingtherelationshipamong
itemsinthedatabase. Without R i
,twosimilaritemsinthedatabaseT
wouldstillbesimilartoeachotherinthedisguiseddatabaseT 0
;addinga
dierentrandomnumberto eachdierentitemwillmake thissimilarity
disappear.
4.5. SSCO/APPROX
This model poses more challenges than the SSO model becase Bob
could now collude against Alice with a client, or he can even become
a client. Therefore, one of the threats would be for Bob to
compro-misetheprivacyofthedatabasebyconductinga numberofqueriesand
derivingthe waythe database isencrypted ordisguised. A secure
pro-tocol should resist this type of active attack. We have a solution for
theSSCO/Approx/Squ problem that works for both innite and nite
alphabets.
4.5.1 SSCO/Approx/Squ protocol. One of the dierences
between the SSCO/Approx problem and the SSO/Approx problem is
who sends the query. In the SSO/Approx/Squ protocol, Alice
trans-formsthequeryxtoavector~xQ 1
,andsendsthevectorto Bob;inthe
SSCO/Approx/Squ protocol, the client Carl will send the query.
Be-cause Carl does not know Q, he cannot construct ~xQ 1
by himself. If
Carlcould gettheresult of~xQ 1
securely,namely withoutdisclosing~x
to Alice and without knowing Q of course, we would have a solution.
BecauseQ 1 =(~q T 1 ;:::;~q T m ),computing~xQ 1
securelyisbasicallyatask
ofcomputing~x~q T k
fork =1:::m,whichcan besolvedusingthesame
techniqueasthat usedinsolvingPIM/Approx/Squproblem.
Therefore, by modifying step 2 of the SSO/Approx/Squ protocol
slightly, and also by using a form of \R
(score+R A
)", instead of
the form of \score+R
A
" as is used in SSO/Approx/Squprotocol, we
obtainaSSCO/Approx/Squprotocolasthefollowing:
LetT =ft 1
;:::;t N
g be thedatabaseAlicewantstooutsource toBob,
and assumethelengthof each stringt i
is n. Alicegenerates N random
numbers R 1 ;:::;R N . Foreach t i =y i;1 ;:::;y i;n ,let ~ t i =( P n k=1 y 2 i;k +R R i ;y i;1 ;:::;y i;n ;1;1;R i );let ~z i =Q ~ t T i
,where Qis a randomly generated
In what follows,we assumethat Alice outsourced thedatabase T = f~z 1 ;:::;~z N g to Bob. Protocol
1 Whenever a client Carl wants to conduct a search on query x =
x 1
:::x n
,he generates a randomnumberR
C .
2 Alicegenerates randomnumbersR
A
and R
.
3 CarlandAlicejointlycompute~q=R ~ xQ 1 ,where~x=(1; 2x 1 ;:::; 2x n ;R C ;R A
;1). The computationdoesnot reveal Alice's secret
Q, R
A or R
to Carl,nor doesit reveal Carl's privatequery x or
R C
to Alice.
4 Carlthensendsthevector ~q to Bob.
5 Bob computes score
i = ~q~z T i = R ( P n k=1 y 2 i;k 2 P n k=1 x k y i;k + R C +R A )
6 Bobreturnsto Alice score 0 =min N i=1 score i .
7 Alicecomputesscore
00 = score 0 R R A = P n k=1 y 2 i;k 2 P n k=1 x k y i;k + R C
and sendsitto Carl.
8 Carlcomputesscore=score
00 + P n k=1 x 2 k R C
,whichistheanswer
he seeks.
BecauseofR C
,Alicecannotgureouttheactualscoreforthisquery,
and because ofR
A
and R
, Carlcannot gure outthe actualscore
be-tweenhisqueryandotheritemsinthedatabase(exceptforthematched
one), even if Carl could collude with Bob. The communication cost of
theprotocol isO(n 2
), mostofwhichiscontributed bythecomputation
ofR
~xQ 1
instep 3.
5. CONCLUSION AND FUTURE WORK
Wehavedevelopedfourmodelsforsecureremotedatabaseaccess,and
presenteda class of problemsand solutions forthese models. Forsome
problems, such asSSO/Approx/Squand SSCO/Approx/Squ problems,
oursolutionsarepractical,and theyonlyneedO(n)andO(n 2
)
commu-nicationcost, respectively; whileforPIM/Approx and PIMPD/Approx
problems, our results are still at the theoretical stage because of their
highcommunication cost. Improvingthecommunication cost forthose
solutionsisoneavenueforfuturework: Wesuspectthat, wheneverthere
(per-higherdimensional indexing techniques[25, 6,2, 18, 26, 14]. However,
combining those schemes with our protocols will not be a trivialtask,
and theincreaseintheconstant factors hidingbehindthe \big-oh"
no-tation may well negate the benets of the asymptotic sub-linearity in
N;forexample, ina treesearch forprocessingthequery,Bobhasto be
preventedfromknowingwhatnodesofhistreearevisitedwhen
process-ing the query (otherwise he gets information about the query), which
requiresusingaPIR-likeprotocolat eachnodedownthetree. Buteven
thatis not enough: Alice herself must be prevented from learning
any-thingaboutBob'sdataother thantheanswerto herquery,butinmost
of thetree-based schemesin theliteraturethe comparison at a node of
thesearch tree givesinformationaboutthedatathatisassociatedwith
that node (these schemes were designed for an environment where the
searcher is the owner, and may require substantial modicationbefore
theyare usedinourcontext).
Another avenue for future work is the pattern matching of
branch-ing structures: the pattern matching problems that we have discussed
only involve patterns of simple linear structure; in many applications,
patterns have a branching structure, such as a tree or a DAG. The
M/Approx/Edit/Treeprobleminourmodelisoneoftheexamples.
De-velopingasecureprotocoltodealwiththistypeofqueryisachallenging
problem.
Finally, avoiding the use of a third party in the protocols that use
[1] A. Apostolico and Z.Galil, editors. Pattern Matching Algorithms.
Oxford UniversityPress,1997.
[2] S. Arya. Ph.d thesis: Nearest neighborsearching and applications.
Technical Report CS-TR-3490, University of Maryland at College
Park,June 1995.
[3] M. Atallah and J. Rice. Secure outsourcing of scientic
computa-tions.TechnicalReportCOASTTR98-15,DepartmentofComputer
Science, Purdue University,1998.
[4] B. Chor and N. Gilboa. Computationally private information
re-trieval(extended abstract). In Proceedings of the twenty-ninth
an-nual ACM symposium on Theory of computing, El Paso, TX USA,
May4-61997.
[5] M.CrochemoreandW.Rytter. TextAlgorithms. OxfordUniversity
Press, 1994.
[6] R.Agrawal,C.FaloutsosandA.Swami.EÆcientsimilaritysearchin
sequencedatabases. InProceeding of theFourthInt'lConferenceon
Foundations of Data Organization and Algorithms, October 1993.
Also in Lecture Notes in Computer Science 730, Springer Verlag,
1993, 69{84.
[7] B. Chor, N. Gilboa and M. Naor. Privateinformation retrievalby
keywords. Technical ReportTR CS0917, Department of Computer
Science, Technion, 1997.
[8] O. Goldreich. Secure multi-party
com-putation (working draft). Available from
http://www.wisdom.weizmann.ac.il/home/oded/publichtml/foc.html,
1998.
[9] P. F. Syverson, D. M. Goldschlag and M. G. Reed. Anonymous
connections and onion routing. In Proceedings of 1997 IEEE
Sym-posiumonSecurityandPrivacy,Oakland,California,USA,May5-7
[10] S. Goldwasser. Multi-party computations: Past and present. In
Proceedings of the sixteenth annual ACM symposium on Principles
of distributed computing, Santa Barbara, CA USA, August 21-24
1997.
[11] Y. Gertner,S.Goldwasser andT. Malkin. A randomservermodel
forprivateinformationretrieval. In2ndInternational Workshopon
Randomization andApproximation TechniquesinComputerScience
(RANDOM '98),1998.
[12] R. Gonzalezi and R. Woods. Digital Image Processing.
Addison-Wesley, Reading,MA,1992.
[13] D.Guseld.AlgorithmsonStrings,Trees,andSequences:Computer
Science and Comutational Biology. Cambridge University Press,
1997.
[14] A.Guttman. R-trees:a dynamicindexstructureforspatial
search-ing. In ACM SIGMOD Workshop on Data Mining and Knowledge
Discovery,pages 163{174, Boston, MA,1984.
[15] G. Di-Crescenzo, Y. Ishai and R. Ostrovsky. Universal
service-providersfor databaseprivateinformationretrieval. InProceedings
of the 17th Annual ACM Symposium on Principles of Distributed
Computing, September21 1998.
[16] Y. Ishai and E. Kushilevitz. Improved upper bounds on
information-theoretic private information retrieval (extended
ab-stract). In Proceedings of the thirty-rst annual ACM symposium
on Theory of computing, Atlanta, GAUSA, May 1-41999.
[17] A. Jain. Fundamentals of DigitalImage Processing. Prentice Hall,
Englewood Clis,NJ,1989.
[18] J. Kleinberg. Two algorithms for nearest-neighbor search in high
dimensions. InProceedingsof the 29th ACM Symposium on Theory
of Computing,1997.
[19] B.Chor,O.Goldreich,E.KushilevitzandM.Sudan. Private
infor-mationretrieval.InProceedingsofIEEESymposiumonFoundations
of Computer Science, Milwaukee,WIUSA, October23-25 1995.
[20] E.Kushilevitzand R.Ostrovsky. Replicationis notneeded:Single
database,computationally-privateinformationretrieval.InProcee
d-ings of the 38th annualIEEEcomputersociety conferenceon F
oun-dation of Computer Science, Miami Beach, Florida USA, October
20-22 1997.
of the thirtieth annual ACM symposium on Theory of computing,
Dallas, TXUSA, May 24-261998.
[22] C.Cachin,S.MicaliandM.Stadler.Computationallyprivate
infor-mation retrievalwithpolylogarithmiccommunication. Advances in
Cryptology:EUROCRYPT'99,LectureNotes inComputerScience,
1592:402{414, 1999.
[23] O.Goldreich,S.MicaliandA.Wigderson. Howto playanymental
game.InProceedingsofthe 19thannualACMsymposiumon Theory
of computing, pages 218{229, 1987.
[24] M. K. Reiterand A. D. Rubin. Crowds: anonymity forweb
trans-action. ACM Transactions on Information and System Security,
1(1):Pages 66{92, 1998.
[25] Hanan Samet. Multidimensional data structures. In Mikhail J.
Atallah, editor, Algorithms and Theory of Computation Handbook,
chapter 18.CRCPress, 1999.
[26] N. Beckmann, H-P. Kriegel, R. Schneider,B. Seeger. The r*-tree:
An eÆcientand robustaccess methodforpointsand rectangles. In
ACM SIGMOD Workshop on Data Miningand Knowledge
Discov-ery, pages322{331, Atlantic City,NJ, 1990.
[27] D.Song,D.WagnerandA.Perrig.Practicaltechniquesforsearches
on encrypted data. In Proceedings of 2000 IEEE Symposium on
Security andPrivacy, Oakland,California, USA,May14-17 2000.
[28] A. Yao. Protocols for secure computations. In Proceedings of the
23rd Annual IEEE Symposium on Foundations of Computer