Executing SQL over Encrypted Data in the Database-Service-Provider Model
Hakan Hacıg ¨um ¨us¸
x;1Bala Iyer
2Chen Li
1Sharad Mehrotra
11
Department of Information and Computer Science, University of California, Irvine, CA 92697, USA [email protected],
fchenli,sharad
g@ics.uci.edu
2
IBM Silicon Valley Lab., San Jose, CA 95141, USA, [email protected]
ABSTRACT
Rapidadvan esinnetworkingandInternette hnologieshave
fueledtheemergen eofthe\softwareasaservi e"modelfor
enterprise omputing. Su essfulexamplesof ommer ially
viablesoftwareservi esin luderent-a-spreadsheet,ele troni
mail servi es, general storage servi es, disaster prote tion
servi es. \DatabaseasaServi e"modelprovidesuserspower
to reate,store,modify,andretrievedatafromanywherein
the world, as long as theyhave a ess to the Internet. It
introdu esseveral hallenges,animportantissuebeingdata
priva y. Itisinthis ontextthatwespe i allyaddressthe
issueofdatapriva y.
There are two main priva y issues. First, the owner of
the data needs to be assured that the data stored onthe
servi e-provider site is prote ted against data thefts from
outsiders. Se ond, data needs to be prote ted even from
theservi eproviders,iftheprovidersthemselves annotbe
trusted. In this paper, we fo us onthe se ond hallenge.
Spe i ally, weexplore te hniquesto exe ute SQLqueries
overen rypteddata. Ourstrategyis topro essasmu hof
thequeryaspossible attheservi eproviders'site, without
havingtode ryptthe data. De ryptionandthe remainder
ofthequerypro essingareperformedatthe lientsite. The
paper explores an algebrai framework to split the query
tominimizethe omputation atthe lient site. Resultsof
experimentsvalidatingourapproa harealsopresented.
1. INTRODUCTION
The Internet has made it possible for all omputers to
be onne tedtooneanother. Thein uen eoftransa tion-
pro essing systemsand the Internet ushered in the era of
e-business. The Internet has also had a profound impa t
onthesoftware industry. It hasfa ilitated anopportunity
x
SupportedinpartbyanIBMPh.D.Fellowship. Thiswork
wasperformedwhile theauthorwasatIBM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
ACM SIGMOD ’2002 June 4-6, Madison, Wisconsin, USA Copyright 2002 ACM 1-58113-497-5/02/06 ...
$5.00.
to provide software usage over the Internet, and has led
to anew ategoryof businesses alled \appli ation servi e
providers"orASPs.ASPsprovideworldwide ustomersthe
privilegetousesoftwareovertheInternet. ASPsarestaed
byexpertsintheartofputtingtogethersoftwaresolutions,
using a variety of software produ ts, for familiar business
servi es su has payroll, enterprise resour e planning, and
ustomer-relationship marketing. ASPsoer their servi es
overtheInternettosmallandlargeworldwideorganizations.
Sin exed ostsareamortizedoveralargenumberofusers,
there isthe potential toredu e the servi e osteven after
possiblyin reasedtele ommuni ationsoverhead.
Itispossibletoprovidestorageandlea essasservi es.
Thenaturalquestionisthefeasibilityofprovidingthenext
value-addlayerindatamanagement.Fromthebusinessper-
spe tive,databaseasaservi einheritsalltheadvantagesof
theASPmodel, indeedevenmore,giventhatalargenum-
ber of organizations have their own DBMSs. The model
allows organizationsto leverage hardwareand software so-
lutions provided by the servi e providers, without having
to develop themontheir own. Perhaps moreimportantly,
it provides a way for organizations to share the expertise
of database professionals, thereby utting the people ost
ofmanaginga omplexinformationinfrastru ture,whi his
important both for industrial and a ademi organizations
[15℄.
Fromthete hnologi alangle, themodelposesmanysig-
ni ant hallengesforemostofwhi histheissueofdatapri-
va y and se urity. Inthe database-servi e-provider model,
user data resides on the premises of the database-servi e
provider. Most orporationsviewtheirdataasaveryvalu-
ableasset. Theservi eproviderwouldneedtoprovidesuÆ-
ientse uritymeasurestoguarddatapriva y. Atleasttwo
data-priva y hallenges arise. The rst hallenge is: how
do servi e providers prote t themselves from theft of us-
tomerdatafromha kersthatbreakintotheirsiteands an
disks? En ryptionofstoreddataisthestraightforwardso-
lution, but not without hallenges. Trade-os needto be
maderegardingen ryption te hniquesandthedatagranu-
larityfor en ryption. Thisrst hallengewasexaminedby
Ha gumus etal. [6℄. It wasfound that hardware en ryp-
tionissuperiortosoftware en ryption. En ryptingdatain
bulkredu edtheper-byteen ryption ostsigni antly,ex-
posingtostartupoverheads. En ryptingbyrowwas found
preferabletoen ryptingbyeldforqueriesfromtheTPC-H
ben hmark[14℄.
ismore omplexsin eitin ludesprote tionfromthedatabase
provider. Therequirementisthat en rypteddatamaynot
be de rypted at the provider site. A straightforward ap-
proa histotransmittherequisiteen ryptedtablesfromthe
server(attheprovidersite)tothe lient,de ryptthetables,
andexe utethequeryatthe lient. Butthisapproa hmiti-
gatesalmosteveryadvantageoftheservi e-providermodel,
sin e now primary data pro essing has to o ur on lient
ma hines. It willbe ome lear later,for alargenumberof
queries su h as sele tions, joins, and unions, mu h of the
datapro essing anbedoneattheserver,andtheanswers
anbe omputedwithlittleeortbythe lient.
Ourproposed system,whosebasi ar hite tureand on-
trol oware showninFigure1, is omprised ofthreefun-
damental entities. A user poses the query to the lient.
A server is hosted by the servi e provider who stores the
en rypteddatabase. Theen rypteddatabaseisaugmented
withadditionalinformation(whi hwe alltheindex)allows
ertain amount of querypro essing to o ur at the server
withoutjeopardizing datapriva y. A lientstoresthe data
at the server. Client 1
also maintains metadata for trans-
latinguserqueriestotheappropriaterepresentationonthe
server,andperformspost-pro essingonserverqueryresults.
Basedontheauxiliaryinformationstored,wedevelopte h-
niquestosplitanoriginalqueryoverunen ryptedrelations
into (1)a orrespondingqueryoveren ryptedrelations to
runontheserver,and(2)a lientqueryforpost-pro essing
results of the server query. We a hieve this goal by de-
veloping an algebrai framework for query rewriting over
en ryptedrepresentation. Finally,weexplorethefeasibility
andee tivenessofourapproa hbytestingtheperforman e
ofourstrategyovernumerousqueries.Ourresultsshowthat
priva yfromservi eproviders anbea hievedwithreason-
ableoverheadestablishingthefeasibilityofthemodel.
There is previous work in dierent resear h areas some
ofwhi harerelatedtoourwork. Sear honen rypteddata
[2℄,whereonlykeywordsear hissupported,anddoingarith-
meti overen rypteddata[10℄havebeenstudiedinthelit-
erature. Howeverfun tionalitiesprovidedbythosearevery
limited and insuÆ ient in exe uting omplexSQL queries
overen rypteddata.
The rest ofthe paperis organized as follows. Se tion2
presents how data is en rypted and stored onthe server.
Se tion3dis usseshowa onditioninaqueryistranslated
toa onditionontheen rypteddataattheserver. InSe -
tion4wedes ribehowindividual relationaloperators su h
as sele tion, join, set dieren e, and group by are imple-
mented. Se tion5showshowtorewriteaquerybysplitting
itintoaserverqueryanda lientquery,su hthatthe om-
putationatthe lientisredu ed.Se tion6givesourexper-
imentalresultsonqueriesfromtheTPC-Hben hmark.We
on ludethepaperinSe tion7.
2. RELATION ENCRYPTION AND STOR- AGE MODEL
Beforewedis usste hniquesforquerypro essingoveren-
rypteddata,letusrstdis usshowtheen rypteddatais
storedattheserver.
Forea hrelationR (A1;A2;:::;An),westoreontheserver
1
Figure 1: The servi e-provider ar hite ture.
anen ryptedrelation:
R S
(etupl e;A S
1
;A S
2
;:::;A S
n )
where the attributeetupl e (We will explain how etupl e is
denedinSe tion2.4.) storesanen ryptedstringthat or-
respondstoatupleinrelationR . 2
Ea hattributeA S
i or-
respondstotheindexfortheattributeA
i
thatwillbeused
for querypro essingattheserver. For example, onsidera
relationempbelowthatstoresinformationaboutemployees.
eid ename salary addr did
23 Tom 70K Maple 40
860 Mary 60K Main 80
320 John 50K River 50
875 Jerry 55K Hopewell 110
Theemptableismappedtoa orrespondingtableatthe
server:
emp S
(etupl e;eid S
;ename S
;sal ary S
;addr S
;did S
)
Itisonlyne essaryto reateanindexforattributesinvolve
insear handjoinpredi ates. Inthe above example, ifwe
knew thattherewouldbenoquerythat involvesattribute
addr ineitherasele tion ora join, thenthe indexonthis
attribute neednot be reated. Without loss of generality,
we assume that anindex is reatedoverea hattribute of
therelation.
2.1 Partition Functions
WeexplainwhatisstoredinattributeA S
i ofR
S
forea h
attributeAiofR . Forthispurpose,wewillneedtodevelop
some notations. We rst map the domainof values (D
i )
ofattributeR :Ai intopartitionsfp1;:::;p
k
g,su hthat(1)
thesepartitionstakentogether overthewholedomain;and
(2)any two partitionsdonotoverlap. Formally, wedene
afun tionpartitionasfollows:
partition(R :A
i
)=fp1;p2;:::;p
k g
Asanexample, onsidertheattributeeidoftheemptable
above. Supposethevaluesofdomainofthisattributeliein
therange[0;1000℄. Assumethatthewholerangeisdivided
2
Notethatwe ouldalternativelyhave hosentoen ryptat
theattributelevelinsteadoftherowlevel.Ea halternative
hasitsownprosand ons. Wepointtheinterestedreadersto
[6℄foradetaileddes ription. Therestofthispaperassumes
into5partitions : [0;200℄, (200;400℄, (400;600℄, (600;800℄,
and(800;1000℄. Thatis:
partition(emp:eid)=
f[0;200℄;(200;400℄;(400;600℄;(600;800℄;(800;1000℄g
Dierentattributesmaybepartitionedusingdierentpar-
titionfun tions. Itshouldbe learthatthepartitionofat-
tributeAi orresponds to a splitting of itsdomain into a
setofbu kets. Anyhistogram- onstru tionte hnique,su h
as MaxDi, equi-width, or equi-depth [9℄, ould be used
to reate partitioning of attributes. In the examplesused
to explain our strategy, for simpli ity, we will assume the
equi-widthpartitioning. Extensionofourstrategytoother
partitioningmethodsisrelativelystraightforward,thoughit
willrequire hangestosomeofthenotationsdeveloped.For
example,unlikeequi-width asewhereavaluemapstoonly
asinglehistogrambin,inequi-depthitmay maptomulti-
plebu kets. Ournotationassumesthatea hvaluemapsto
asinglebu ket. Inthe experimentalse tion, besidesusing
theequi-widthwewillalsoevaluateourstrategyunderthe
equi-depthpartitioning.
Intheaboveexample,anequi-widthhistogramwasillus-
trated. Notethat whenthe domainof anattribute orre-
sponds to aeld overwhi h orderingis well dened(e.g.,
the eid attribute), we will assume that a partition pi is a
ontinuousrange. Weusep
i
:l owandp
i
:hightodenotethe
lowerandupperboundaryofthepartition,respe tively.
2.2 Identification Functions
Furthermore, we dene anidenti ation fun tion alled
identtoassignanidentier ident
R:A
i (p
j
)toea hpartition
p
j
of attributeA
i
. Figure2 shows the identiersassigned
tothe 5 partitions ofthe attributeemp:eid. For instan e,
identemp:eid([0;200℄)=2,andidentemp:eid((800;1000℄)=4.
0 200 400 600 1000
800
2 7 5 1 4
Figure 2: Partition and identi ation fun tions of
emp:eid.
Theidentfun tionvalueforapartitionisunique,thatis,
identR:A
i
(pj) 6= identR:A
i
(pl), if j 6= l . For this purpose,
a ollision-freehashfun tionthat utilizesproperties ofthe
partitionmaybeusedasanidentfun tion. Forexample,in
the asewhere apartition orrespondstoanumeri range,
thehashfun tionmayusethestart and/orendvaluesofa
range.
2.3 Mapping Functions
Giventheabovepartitionandidenti ationfun tions,we
deneamappingfun tionMapR:A
i
thatmapsavaluevin
thedomainofattributeA
i
totheidentierofthepartition
towhi hvbelongs: Map
R:A
i
(v)=ident
R:A
i (p
j
),where p
j
isthepartitionthat ontainsv.
In the example above, the following table shows some
values of themapping fun tionfor attribute emp:eid. For
instan e, Map
emp:eid
(23) = 2, Map
emp:eid
(860) = 4, and
Mapemp:eid(875)=4.
3
Notethatitisnotne essaryto reateallofthepartitions
at the beginning. They an be reated as the values are
eidvaluev 23 860 320 875
Map
emp:eid
(v) 2 4 7 4
Wefurther lassifytwotypesofmappingfun tions:
1. Order preserving: A mapping fun tion MapR:A
i is
alledorder preservingifforanytwovaluesv
i andv
j
inthedomainof A
i ,if v
i
<v
j
,thenMap
R:Ai (v
i )
MapR:A
i (vj).
2. Random: Amappingfun tionis alledrandomifitis
notorderpreserving.
A random mapping fun tion provides superior priva y
ompared to its orresponding order-preserving mapping.
However, aswewill seelater, whether amappingfun tion
is orderpreservingor notae tshowwe translateaquery
into queries onthe lient and server. Query translation is
simplied usinga order-preserving mapping fun tion. We
will develop translation strategies for both types of map-
pingfun tions.
Wefurtherdenethreemoremappingfun tionsthatwill
help usintranslating queriesovertheen ryptedrepresen-
tation. Whilethe rstfun tiondened holdsoverany at-
tribute,thelattertwoholdfortheattributeswhosedomain
valuesexhibittotalorder. Appli ationofthemappingfun -
tiontoavaluev,greaterthanthemaximumvalueinthedo-
main,vmax,returnsMapR:A
i
(vmax). Similarly,appli ation
ofthemappingfun tiontoavaluev,lessthantheminimum
valueinthedomain,v
mim
,returnsMap
R:A
i (v
min ).
LetSbeasubsetofvaluesinthedomainofattributeAi,
and v be a value inthe domain. We dene the following
mappingfun tionsonthepartitionsasso iatedwithA
i :
Map
R:A
i
(S)=fident
R:A
i (p
j )jp
j
\S6=;g
Map
>
R:A
i
(v)=fident
R:A
i (p
j )jp
j
:l owvg
Map
<
R:A
i
(v)=fidentR:A
i
(pj)jpj:highvg
Essentially, Map
R:A
i
(S) isthe setofidentiers ofparti-
tions whoseranges mayoverlapwiththevaluesinS. The
resultofMap
>
R:A
i
(v)isthesetofidentiers orrespondingto
partitionswhoserangesmay ontainavaluenotlessthanv.
Likewise,Map
<
R:A
i
(v)isthesetofidentiers orresponding
topartitions whoserangesmay ontainavalue notgreater
thanv.
2.4 Storing Encrypted Data
We now have enough notations to spe ify how to store
the en rypted relation R S
on the server. For ea h tuple
t=ha1;a2;:::;aniinR ,therelationR S
storesatuple:
hen rypt(fa
1
;a
2
;:::;a
n g);Map
R:A1 (a
1 );
MapR:A
2
(a2);:::;MapR:A
n (an)i
where en rypt is the fun tion used to en rypt a tuple of
the relation. For instan e, the following is the en rypted
relationemp S
storedontheserver:
etuple eid
S
ename S
salary S
addr S
did S
1100110011110010... 2 19 81 18 2
1000000000011101... 4 31 59 41 4
1111101000010001... 7 7 7 22 2
1010101010111110... 4 71 49 22 4
Therst olumnetupl e ontainsthestring orresponding
to theen ryptedtuples inemp. For instan e,the rsttu-
to\1000000000011101..."equaltoen rypt(860;Mary;60K ;
Main;80). Wetreattheen ryptionfun tionasabla kbox
inourdis ussion. Anyblo k ipherte hniquesu hasAES
[1℄, RSA [11℄, Blowsh [12℄, DES[3℄ et ., an be used to
en ryptthetuples.
These ond olumn orresponds totheindexontheem-
ployeeids. Forexample, valueforattributeeid intherst
tupleis23,andits orrespondingpartitionis[0;200℄. Sin e
this partition is identied to 2, we store the value \2" as
the identier of the eid for this tuple. Similarly,we store
the identier \4" for the se ond employee id 860. In the
tableabove,weusedierentmappingfun tionsfordierent
attributes. Themapping fun tions forthe ename,sal ary,
addr, and did attributes are not shown, but they are as-
sumedtogeneratetheidentierslistedinthetable.
Ingeneral,weusethenotation\E"(\En rypt")tomap
arelationRtoitsen ryptedrepresentation. Thatis,given
relationR (A1;A2;:::;An),relationE(R )isR S
(etupl e;A S
1
;
A S
2
;:::; A S
n
). Inthe above example, E(emp)is thetable
emp S
.
2.5 Decryption Functions
Given the operator E that maps a relation to its en-
ryptedrepresentation,wedeneitsinverseoperatorDthat
mapstheen ryptedrepresentationtoits orrespondingun-
en ryptedrepresentation. Thatis,D(R S
)=R . Intheex-
ampleabove,D(emp S
)=emp.TheDoperatormayalsobe
appliedonquery expressions. A queryexpression onsists
ofmultipletables related by arbitraryrelational operators
(e.g.,joins,sele tions,et .)
As it will be lear later, the general s hema of an en-
ryptedrelationortheresultofrelationaloperatorsamongst
en ryptedrelations,R S
i is:
hR S
1
:etupl e;R S
2
:etupl e;:::;
R S
1 :A
S
1
;R S
1 :A
S
2
;:::;R S
2 :A
S
1
;R S
2 :A
S
2
;:::i
Whenthede ryptionoperatorDisappliedtoR S
i
,itstrips
otheindexvalues(R S
1 :A
S
1
;R S
1 :A
S
2
;:::;R S
2 :A
S
1
;R S
2 :A
S
2
;:::)
andde rypts(R S
1
:etupl e;R S
2
:etupl e;:::)totheirunen rypted
attributevalues.
Asanexample,assumethatanothertabledenedasmgr
(mid; did) was also stored in the database. The orre-
spondingen rypted representation E(mgr) will be atable
mgr S
(etupl e; mid S
; did S
). Suppose we were to ompute
a join between tables emp S
and mgr S
on their did S
at-
tributes. Theresultingrelation temp S
will ontaintheat-
tributeshemp S
:etupl e;eid S
;ename S
;sal ary S
;addr S
;emp S
:
did S
; mgr S
:etupl e; mid S
; mgr S
:did S
i. If we are to de-
ryptthetemp S
relationusingthe Doperator to ompute
D(temp S
),the orrespondingtablewill ontaintheattributes
(eid;ename;sal ary;addr;emp:did;mid;mgr:did)
Thatis,D(temp S
)willde ryptalloftheen rypted olumns
intemp S
anddroptheauxiliary olumns orrespondingto
theindi es.
3. MAPPING CONDITIONS
Map ondIn this se tion we studyhow to translate spe i query
onditions in operations (su h as sele tions and joins) to
orresponding onditionsovertheserver-siderepresentation.
Thistranslationfun tionis alledMap ond. On eweknow
howrelationaloperators aretranslatedovertheserver-side
implementation,andhowquerytreesaretranslated.
Forea hrelation,theserversidestorestheen ryptedtu-
ples, along with the attribute indi esdetermined by their
mapping fun tions. Meanwhile, the lient storesthe meta
dataaboutthespe i indi es,su hastheinformationabout
the partitioning of attributes, the mapping fun tions, et .
The lientutilizesthisinformationtotranslateagivenquery
Q to its server-side representation Q S
, whi h is then exe-
uted by theserver. We onsider query onditions hara -
terizedbythefollowinggrammarrules:
Condition AttributeopValue;
Condition AttributeopAttribute;
Condition (Condition_Condition)j(Condition^Con-
dition)j(:Condition);.
Allowedoperationsforopin ludef=;<;>;;g.
Inthedis ussionbelowwewillusethefollowingtablesto
illustratethetranslation.
emp(eid, ename, salary, addr, did, pid)
mgr(mid, did, mname)
proj(pid, pname, did, budget)
Attribute = Value: Su ha onditionarises insele tion
operations. Themappingisdenedasfollows:
Map ond(Ai=v))A S
i
=MapA
i (v)
As dened in Se tion2.3, fun tion MapA
i
mapsv to the
identier of A
i
's partitionthat ontains withvalue v. For
instan e, onsidertheemptableabove,wehave:
Map
ond
(eid=860))eid S
=4
sin eeid=860ismappedto4bythemappingfun tion of
thisattribute.
Attribute < Value: Su h a ondition arises insele tion
operations. Theattribute musthave awelldened order-
ingoverwhi hthe\<"operatorisdened.Dependingupon
whetherornotthemappingfun tionMapA
i
oftheattribute
isorder-preservingorrandom,dierenttranslationsarepos-
sible.
4
Orderpreserving:Inthis ase,thetranslationisstraight-
forward:
Map ond(Ai<v))A S
i
MapA
i (v)
Random: Thetranslationisalittle omplex. We he k
if the attribute value representation A S
i
lies inany of the
partitions that may ontaina value v 0
where v 0
<v. For-
mally,thetranslationis:
Map ond(Ai<v))A S
i 2Map
<
A
i (v)
Forinstan e,thefollowing onditionistranslated:
Map ond(eid<280))eid S
2f2;7g
sin e all employee ids less than 280 have two partitions
[0;200℄and(200;400℄,whoseidentiersaref2;7g.
4
Note that we an always use the mapping dened in
the random ase to translate onditions involving order-
preserving attributes. We dierentiate between the two
asessin ethetranslation(aswellasthequerypro essing)
iseasierfortheformer ase.
previousone. Asbeforewedierentiatewhetherornotthe
mappingfun tionisorderpreserving. Thetranslationisas
follows:
Orderpreserving:Map
ond (A
i
>v))A S
i
Map
A
i (v);
Random:Map
ond
(Ai>v))A S
i 2Map
>
A
i (v).
Forinstan e,thefollowing onditionistranslated:
Map
ond
(eid>650))eid S
2f1;4g
sin eallemployeeidsgreaterthan650aremappedtoiden-
tiers: f1;4g.
Attribute1 = Attribute2: Su ha ondition might arise
inajoin. Thetwo attributes anbefromtwodierentta-
bles,orfromtwoinstan esofthesametable.The ondition
analsoariseinasele tion, andthetwoattributes anbe
fromthesametable. Thefollowing isthetranslation:
Map ond(Ai=Aj))
W
' A
S
i
=ident
A
i (p
k )^A
S
j
=ident
A
j (p
l )
where'ispk2partition(Ai);pl2partition(Aj);pk\pl6=
;. That is, we onsider all possible pairs of partitions of
A
i andA
j
that overlap. For ea h pair(p
k
;p
l
), we havea
ondition onthe identiers of thesetwo partitions: A S
i
=
ident
A
i (p
k )^A
S
j
= ident
A
j (p
l
). Finally we take the dis-
jun tionofthese onditions. Theintuitionisthatea hpair
ofpartitionsmayprovidesomevaluesofAiandAjthat an
satisfythe onditionA
i
=A
j .
Partitions Ident
emp:did
Partitions Ident
mgr:did
[0,100℄ 2 [0,200℄ 9
(100,200℄ 4 (200,400℄ 8
(200,300℄ 3
(300,400℄ 1
Forinstan e,thetableaboveshowsthepartitionandiden-
ti ationfun tionsoftwoattributes emp:didand mgr:did.
Then onditionemp:did=mgr:didistranslatedtothefol-
lowing onditionC1:
C
1
: (emp
S
:did S
=2^mgr S
:did S
=9)
_ (emp S
:did S
=4^mgr S
:did S
=9)
_ (emp S
:did S
=3^mgr S
:did S
=8)
_ (emp S
:did S
=1^mgr S
:did S
=8).
Attribute1 <Attribute2: Againsu ha onditionmight
ariseineitherajoinor inasele tion. Let us assumethat
the onditionisA
i
<A
j
. Justas intranslating onditions
with inequality operator seen previously, the mapping of
the ondition depends upon whether or not the mapping
fun tionsof theattributes A
i and A
j
areorderpreserving
orrandom. Wespe ifythetranslationforea hinturn.
MapA
j
isorderpreserving: Insu ha asewelistoutall
thepartitionsofAiandidentifyallthepartitionsofAjthat
satisfy theordering ondition. Spe i ally,the mappingis
asfollows:
Map
ond (A
i
<A
j ))
W
p2partition(A
i )
A S
i
=identA
i (p)^A
S
j
MapA
j (p:l ow)
Map
A
i
isorderpreserving: IfA
i
isorderpreserving,we
Map ond(Ai<Aj))
W
p2partition(A
j )
A S
j
=identA
j (p)^A
S
i
MapA
i (p:high)
BothMapA
i
andMapA
j
are orderpreserving: Inthis
asewehavea hoi eofusingeitheroftheabovetwomap-
pings. Our hoi eisbasedonthespe i partitioningofA
i
andAj. We andothetranslationasfollows:
Map
ond (A
i
<A
j ))
W
' MapA
i (p
k
:l ow)MapA
j (p
l :high)
where'ispk2partition(Ai);pl2partition(Aj).
BothMapA
i
andMapA
j
arerandom: Wehavethefol-
lowing translation:
Map
ond
(Ai<Aj))
W
' A
S
i
=identA
i
(pk)^A S
j
=identA
j (pl)
where'isp
k
2partition(Ai);p
l
2partition(Aj);p
l :high
pk:l ow. Thatis,we onsiderallpairsofpartitionsofAiand
A
j
that ouldsatisfythe ondition. Forea hpair,wehavea
ondition orrespondingto thepairoftheiridentiers. We
takethedisjun tionofthese onditions.
Forexample, ondition C
2
:emp:did<mgr:didistrans-
latedto:
C2: (emp
S
:did S
=2^mgr S
:did S
=9)
_ (emp S
:did S
=2^mgr S
:did S
=8)
_ (emp S
:did S
=4^mgr S
:did S
=9)
_ (emp S
:did S
=4^mgr S
:did S
=8)
_ (emp S
:did S
=3^mgr S
:did S
=8)
_ (emp S
:did S
=1^mgr S
:did S
=8).
Conditionemp S
:did S
= 4^mgr S
:did S
= 9is in luded,
sin epartition(100;200℄forattributeemp:didandpartition
(200;400℄ for attributemgr:did anprovide pairsofvalues
thatsatisfyemp:did<mgr:did.
For onditionAttribute1>Attribute2,theMap
ond map-
pingissameasthemappingofAttribute2<Attribute1,as
des ribedabovewiththerolesoftheattributesreversed.
Condition1 _Condition2,Condition1 ^Condition2:
Thetranslationofthetwo omposite onditionsisgivenas
follows:
Map ond(Condition1_Condition2))
Map
ond
(Condition1)_Map
ond
(Condition2)
Map ond(Condition1^Condition2))
Map
ond
(Condition1)^Map
ond
(Condition2)
TranslationofMap ond(:Condition)treatmentismorein-
volved sin e negated queries are not monotoni and their
orre t translationrequires morenotation. Thisdis ussion
anbefoundin[5℄.
Operatorfollows thesamemappingas<andoperator
follows thesamemappingas>. Conditionsthatinvolve
4. IMPLEMENTING RELATIONAL OPER- ATORS OVER ENCRYPTED RELATIONS
Inthisse tionwedes ribehowindividualrelationaloper-
ators(su hassele tions, joins,setdieren e,andgrouping
operators) anbeimplementedintheproposeddatabasear-
hite ture. Ourstrategyistopartitionthe omputationof
theoperators a rossthe lientand theserver. Spe i ally,
we will attempt to ompute a superset of answers gener-
ated by the operator using the attributeindi es stored at
theserver. These answerswillthenbelteredatthe lient
after de ryption to generate the true results. We will at-
temptto minimizethe workdone at the lient as mu has
possible. Furthermore, we will try to ensure as mu h as
possiblethatoperatorsexe utedonthe lientsidearesu h
thatthey anbeappliedtothetuplesarrivingoverthean-
swerstreamassoonastheyarrive(withoutaneedtostore
them). Thepurposeistoguaranteethatthe lient-sideop-
erators anbeeÆ ientlyimplemented. Theimplementation
of operators developed inthis se tion will be used in the
followingse tion,wherewedevelopanalgebrai framework
for rewriting SQL queries for the purpose of splitting the
query omputationa rossthe lientandtheserver.
For explaining the implementation of operators, we will
onsider the following two simplied relations of those in
thepreviousse tion:
emp(eid, did), mgr(mid, did)
Inthepreviousse tionswehavegiventheMapfun tionsof
emp:eid,emp:did,and mgr:did. For simpli ity,weassume
that the Map fun tionof mgr:mid is the same as that of
emp:eid,asshowninFigure3. Inaddition,weuseRandT
todenote tworelations, and use theoperator notations in
[4℄.
0 200 400 600 800 1000
2 7 5 1 4 2 4 3 1
100 200 300 400 0
0 200 400 600 800 1000
2 7 5 1 4
emp:eid emp:did
mgr:mid
200 400
0
9 8
mgr:did
Figure 3: Partition and identi ation fun tions for
fourattributes.
TheSele tionOperator (): Considerasele tionoper-
ation
C
(R )onarelationR ,whereC isa onditionspe i-
edononeormoreoftheattributesA1;A2;:::;AnofR . A
straightforward implementationofsu hanoperatorinour
environment istotransmittherelationR S
fromtheserver
tothe lient. Thenthe lientde ryptsthe resultusingthe
D operator, and implementsthe sele tion. This strategy,
however,pushestheentireworkofimplementingthesele -
tiontothe lient. Inaddition,theentireen ryptedrelation
needsto be transmittedfrom the serverto the lient. An
alternativeme hanismistopartially omputethesele tion
operatorattheserverusingtheindi esasso iatedwiththe
attributesinC,andpushtheresultstothe lient. The lient
de ryptstheresultsandltersouttuplesthatdonotsatisfy
C. Spe i ally,theoperator anberewrittenasfollows:
C (R )=
C
D(
S
Map (C) R
S
)
Intheabovenotation,weadorntheoperator thatex-
e utes at the server with a supers ript \S" to highlight
the fa t that the sele t operator exe utes at the server.
All non-adorned operators are assumed to exe ute at the
lient. The de ryption operator D will only keep the at-
tributeetupl eofR S
,and dropall theotherA S
i
attributes.
We explain the above implementation using an example
eid<395^did=140
(emp).BasedonthedenitionofMap
ond (C)
dis ussedinthepreviousse tion,theabovesele tionopera-
tionwillbetranslatedintoC
D(
S
C 0 emp
S
)
,wherethe
onditionC 0
ontheserveris:
C 0
=Map ond(C)= eid S
2[2;7℄^did S
=4
TheJoinOperator(1):ConsiderajoinoperationR 1
C S.
Thejoin onditionC ouldbeeitherequality onditions(in
whi h ase the join orresponds to an equijoin), or ould
be more general onditions (resultingintheta-joins). The
abovejoinoperation anbeimplementedasfollows:
R 1
C
T =C
D R S
1 S
Map
ond (C)
T S
Asbefore, theS adornmentonthejoinoperatorempha-
sizes thefa tthat thejoinis tobe exe utedatthe server.
For instan e, join operation emp
1
emp:did=mgr:did
mgr is
translatedto:
C
D emp S
1 S
C 0
mgr S
wherethe onditionC 0
ontheserveris onditionC
1 dened
inSe tion3.
The Grouping and Aggregation Operator ( ): A
grouping and aggregation operation is denoted by
L (R ),
where L = LG[LA. LG refers to a list of attributes on
whi h the grouping is performed, and LA orresponds to
a set of aggregation operations. As an example, the op-
eration
C;COUNT(B)!F
(R ) means that we reate groups
using attribute C of relation R , and for ea h group om-
putethe ount(B)fun tion. Thatis,L
G
=fCg,andL
A
=
fCOUNT(B)!Fg. Theresultingrelationwill ontaintwo
attributesCandF.Atupleintheresultwillhaveanentry
forea hdistin tvalueofC,andthenumberoftuplesinthe
groupreportedasattributeF. IfLA=;,onlygroupingis
performed.
Implementation of the grouping operator
L
(R ) an be
a hievedasfollows:
L (R )=
L
D S
L 0(R
S
)
;whereL 0
=fA S
i jA
i 2L
G g
Thatis,theserverwill grouptheen ryptedtuplesbased
onthe attributes ofL
G
. The serverdoesnot performany
aggregation orrespondingtoLA,sin eitdoesnothaveany
values for those attributes inLA. The results of S
L 0 are
returnedtothe lient,whi hperformsthe groupingopera-
tion L. Thisoperation anbeimplementedveryeÆ iently,
sin e everytuplebelonging toasingle groupof L will be
inasingle S
L 0
group omputedby theserver. Asaresult,
the lientonlyneedsto onsidertuplesinasingle S
L 0
group
when omputingthegroups orrespondingto L. Of ourse,
theaggregation fun tionsspe iedinL
A
will be omputed
atthe lient,sin etheir omputationrequiresthattuplesbe
rstde rypted.
did;COUNT(eid)!F (emp)
Thatis, wewant to ndthe numberof employeesinea h
department. LetL denote\did;COUNT(eid)!F." The
operationistranslatedto:
L
D S
did S
(emp S
)
Thatis,werstdoagroupingonthedid S
attributeonthe
server. Afterthegroupedtuplesarereturnedtothe lient,
wede ryptthedata,andperformthegroupingoperationon
thedidattribute. Thisstep anbedoneeÆ iently,sin eall
thetupleswiththesamedidhavealreadybeengroupedby
theserver. Finally,we performtheaggregation ount(eid)
to ountthenumberofemployeeidsforea hdid.
The Sorting Operator (): A sorting operation L(R )
anbeimplementedsimilarlytothegroupingoperator.That
is,werstsortonpartitionidsattheserver. Thestrategy
toimplementL(R )isasfollows:
L(R )=L
D S
L 0(R
S
)
whereL 0
=listofA S
i
orrespondingtothe A
i
inthelistL
ofattributes.
Thatis,wedoagroupingoperation S
L 0
ontheen rypted
attributes L 0
of those in L. If the mapping fun tions of
the attributes inL are all orderpreserving, this grouping
S
L
0 operation an be repla ed by a orresponding sorting
operation S
L 0
.Aftertheresultsarereturnedtothe lient,we
allthede ryptionfun tionD,andperformtheLoperation
bysortingthetuplesonattributesL.
Notethattheamountofworkdoneatthe lientto om-
pute
L
inpostpro essingdependsuponwhetherornotthe
attributeslistedinLhaveorder-preservingmappings.Ifthe
attributeshaveorder-preservingmappings,thentheresults
returnedby the server are presorted upto within a parti-
tion. Thus, sorting the results is a simplelo al operation
overa singlepartition. Alternatively,evenifthe mapping
is not order preserving, it is useful to ompute S
at the
servertoredu etheamountof lientwork. Sin ethetuples
have been grouped by the server, L an be implemented
eÆ ientlyusingamerge-sortalgorithm.
Forexample, thesorting operation
eid
(emp) anbeim-
plementedasfollows:
eid
D S
eid (emp
S
)
whereL=feidg. Thatis, werstperformagroupingop-
eration
eid
ontheemp S
relationontheserver. The lient
de ryptsthereturnedtuples,andappliesthesortingopera-
tion
emp .
TheDupli ate-EliminationOperator(Æ):Thedupli ate-
eliminationoperatorÆisimplementedsimilarlytothegroup-
ingoperator:
Æ(R )=Æ
D L(R S
)
whereL=listofallattributesA S
i
whereAiisanattribute
inR .
Thatis,werstgrouptheen ryptedtuplesontheserver
S
ination operation Æ. For example, the operationÆ(emp)is
translatedto:
Æ
D
eid S
;did S
(emp S
)
The Set Dieren e Operator ( ): Implementationof
thedieren eoperationR T attheserverisdiÆ ultsin e,
withoutrstde ryptingtherelations RandT,itisimpos-
sibletotellwhether ornotagiventupleofRalsoappears
inS. However,the indi esstoredat theserver anstill be
usedtomeaningfullyredu etheamountofworkdoneatthe
lient. In thefollowing we assumethat relations R and T
aresetdieren e ompatibleandaredenedoverattributes
A1;A2;:::;AnandB1;B2;:::;Bnrespe tively. Thefollow-
ingrule anbeusedtoimplementthesetdieren eoperator:
R T =R:A
1
;:::;R:An (R
0
) T:B
1
;:::;T:Bn (R
0
)
R 0
=D S
L (R
S S
Map
ond
^
i=1;:::;n (R:A
i
=T:B
i )
T S
)
whereL=fA S
1
;A S
2
;:::;A S
n
;B S
1
;B S
2
;:::;B S
n g.
On e again, the symbol S as a supers ript of the left-
outerjoin emphasizes(denoted )thattheoperatorisim-
plementedonthe serverside. Weillustratethe above rule
throughanexample. Supposewewant to omputeemp
mgr,thatis,wewanttondalltheemployeeswhoarenot
managers. Thequeryistranslatedtothefollowingquery:
emp:eid;emp:did(R 0
) mgr:mid;mgr:did(R 0
)
R 0
=D S
eid S
;emp S
:did S
;mid S
;mgr S
:did S
(emp
S S
C 0 mgr
S
)
The ondition C 0
is: Map
ond
(emp:eid =mgr:mid)^C
1 ,
where C
1
is dened inSe tion 3. (See Attribute1 = At-
tribute2 ase)
Afewobservationsabouttheaboveimplementationofthe
set-dieren eoperator arenoteworthy. First,the grouping
of the results basedonindex attributesis not ne essary{
that is, the translation would be orre t evenwithout the
grouping operator. Thereason for in luding the grouping
operatoristhatit ansigni antlyredu ethe omputation
onthe lient. Forexample, dueto thegrouping operator,
all the tuples that have a NULL value for T S
attributes
will begroupedtogether. Whentheresultingtuples ofthe
setdieren e operatorarriveat the lient, su htuples an
bede ryptedandthe orrespondingRtupleimmediatelyre-
turnedasananswer. Thereasonisthattherearenomat h-
ingtuplesofT that ould ausethepotentialeliminationof
thesetuplesofR . Hen e,theproje tionandthesubsequent
setdieren eimplementedonthe lientsidemayonlybere-
stri tedtothosetuplesforwhi hthe orrespondingT value
isnotNULL.
Furthermore,in omputingtheproje tiontotheattributes
ofRandSandthesubsequentsetdieren ebetweenthetwo
proje tionsweonlyneedto onsiderasinglegroupformed
by S
L
operator atatime. Thatis,aT tuplefromadier-
entgroupwillnoteliminateanRtuplefromanothergroup.
Thus,performingthegroupingattheserverside,whilenot
ne essary, ouldsigni antlyredu ethe omputationatthe
lient.
Se ond,evenwiththeaboveoptimization,theimplemen-
tationoftheset-dieren eoperator usingtheouter-joinon
theservershouldbeusedwith are. Anaivestrategyis to
S S
strategymightbe heaperthanthepreviousstrategysin e
thesizeoftheouterjoinmightbequadrati resultinginhigh
transmissionandde ryption ost omparedtothestrategy
oftransmittingthe tworelationsindividually and omput-
ingthe setdieren eat the lient. Whi hstrategyisused
dependsuponthe ontentoftherelations. Sele tingthespe-
i strategydependsuponintegrating ourframeworkinto
a ost-basedqueryoptimizer,whi hisbeyondthe s opeof
thispaper.
TheUnion Operator ([): Thereareessentiallytwodif-
ferentunionoperatorsbasedonthebagandthesetseman-
ti s. Theformerdoesnoteliminatedupli ates,whilethelat-
terdoes. Theimplementationof theunionoperator based
onbagsemanti sisstraightforward:
R[T=D(R S
[ S
T S
)
Ifwewishto omputetheunionunderthesetsemanti s,
it anbe omputedasfollows:
R[T =Æ
D S
L (R
S
[ S
T S
)
whereL=listofallattributesA S
i
whereAiisanattribute
inR .
While the implementation of the union operator onthe
server side (that is, [ S
) is straightforward, there is one
te hni al hallenge thatarises. Spe i ally, sin etuples in
R S
[ S
T S
ouldoriginateeitherfromR S
orT S
,tobeableto
applythe orre tde ryption fun tionatthe lient,as well
as to orre tly interpretthe values inthe indexattributes
of the result at the server, we will store anadditional at-
tribute inthe result of the union that will determine the
origin of the tuple(thatis, whether it originates from R S
orT S
). Addingsu hanattribute is ru ialfor the orre t
implementation, but we will ignoreit in the dis ussion to
keepthe algebrasimple. Thefull version ofthe paper [5℄
illustratestheneedformaintainingtheadditionalattribute
and the resulting modi ations to the mapping fun tions
anddevelopedalgebra.
TheProje tionOperator():Sin eea htupleinarela-
tionRisen ryptedtogetherintoasinglestringintheetupl e
attributeofrelationR S
atthe server, aproje tionisnot
implementedattheserver. Asaresult,to omputeL(R ),
whereLisasetofattributes,thestrategyistotransmitthe
ompleterelationR S
to the lient, de rypttherelation at
the lient,andthen omputetheproje tion. Thatis,
L(R )=L D(R S
)
Forinstan e,wehave
eid
(emp)=
eid D(emp
S
)
.
5. ALGEBRAIC FRAMEWORK FOR QUERY SPLITTING
GivenaqueryQ,ourpurposeinthisse tionistodevelop
astrategy to splitthe omputation of Qa ross the server
and the lient. The serverwill use the implementationof
therelationaloperatorsdis ussedinthepreviousse tionto
omputeasmu hofthequeryaspossible,relegatingthere-
mainderofthe omputationtothe lient. Ourobje tiveisto
omeupwiththe\best"queryplanforQthatminimizesthe
queryattheserver,thenetworktransmission ost,andthe
I/OandCPU ostatthe lient. Avarietyofpossibilitiesex-
ist. Forexample, onsiderthefollowingqueryovertheemp
table thatretrievesemployeeswhose salaryis greater that
theaveragesalaryofemployeesinthedepartmentidentied
bydid=1.
SELECT emp.name FROM emp
WHERE emp.salary > (SELECT AVG(salary)
FROM emp WHERE did = 1);
did=1
emp emp
avg(sal)!avgsal
ename
./
emp:sal>avgsal
(a) Original query
tree.
did=1
D
emp S
avg(sal)!avgsal
ename
emp S D
Q s
./
emp:sal>avgsal
(b) Repla ing en-
ryptedrelations.
D
emp S
avg(sal)!avgsal
ename
./
emp:sal>avgsal
Q s
did=1
Map ond(did=1)
emp S D
( ) Doing sele -
tionatserver.
./
S
emp:sal>avgsal
emp S
did=1
Map
ond (did=1)
emp S D
avg(sal)!avgsal E
D
emp:sal>avgsal
ename
Q
Q s Q
s Q
(d) Multiple intera tions
betweenClientandServer.
Figure4: Queryplansforemployeeswhomakemore
thanaveragesalaryofemloyeeswhoarein did=1
mgr ./
emp:did=mgr:did
mname
./
emp:pid=proj:pid
pname=
0
disk drive 0
sal>100K
proj emp
(a)Originalquerytree.
D
D ./
emp:did=mgr:did
mname
./
emp:pid=proj:pid
sal>100K
proj S
emp S D
mgr S
Q S
pname=
0
disk drive 0
(b)Afterrepla ingwith
en ryptedrelations.
Figure 5: Evaluation ofajoinquery.
The orrespondingquerytreeandsomeoftheevaluation
strategies are illustrated in Figures 4(a) to (d). The rst
strategy (Figure 4(b)) is to simply transmit the emp ta-
D ./
emp:did=mgr:did
mgr S ./
emp:pid=proj:pid
pname=
0
disk drive 0
sal>100K
D D
s
Map ond(pname=
0
disk drive 0
)
s
Map ond(sal>100K)
proj S
emp S mname
Q S
(a)Afterrewritingsele tions.
D ./
emp:did=mgr:did
mgr S
D D
s
Map ond(pname=
0
disk drive 0
)
s
Map ond(sal>100K)
proj S
emp S mname
./
emp:pid=proj:pid
pname=
0
disk drive 0
^sal>100K
Q S
(b)Afterpullingupsele tions.
s
Map
ond
(pname=
0
disk drive 0
)
s
Map
ond
(sal>100K)
proj S
emp S ./
emp:did=mgr:did
mgr S
mname
./ S
Map
ond
(emp:pid=proj:pid)
pname=
0
disk drive 0
^sal>100K
Q S
D
D
emp:pid=proj:pid
( )Afterrewritingjoin.
s
Map
ond
(pname=
0
disk drive 0
)
s
Map
ond
(sal>100K)
proj S
emp S
mgr S D
Q S
./ S
Map
ond
(emp:did=mgr:did)
./ S
Map
ond
(emp:pid=proj:pid)
mname
pname=
0
disk drive 0
^sal>100K^proj:pid=emp:pid^emp:did=mgr:did
(d)Thenalquerytree.
Figure 6: Query rewriting heuristi for joinqueries
strategy(Figure4( ))isto omputepartoftheinnerquery
attheserver,whi hsele ts(asmanyaspossible)tuples or-
respondingto Map ond(did =1). Theserversends to the
lient the en rypted version of the emp table, i.e., emp S
,
alongwiththeen ryptedrepresentationofthesetoftuples
thatsatisfytheinnerquery. The lientde ryptsthetuples
to evaluatethe remainderof the query. Yetanother pos-
sibility (Figure 4(d))is to evaluatethe inner queryat the
server. Thatis, sele tthetuples orrespondingto theem-
ployeesthat work indepartment did=1. Theresults are
shipped to the lient, whi h de rypts thetuples and om-
putesaveragesalary. Theaveragesalaryisen ryptedbythe
lientandshippedba ktotheserver,whi hthen omputes
thejoinat theserver. Finally,theresultsare de ryptedat
the lient.
5.1 Heuristic Rules to Separate Queries
It shouldimmediately be obviousthat ari hset ofpos-
sibilitiesexistinevaluatingaqueryinour framework, and
that the de ision of the exa t query plan should be ost
based. Thistopi ,however,isoutsidethes opeofthispa-
per. Our attempt is primarily to establish the feasibility
oftheproposedmodel,and ost-basedoptimizationisrele-
gatedtofuturework. Instead,inthisse tion,wewillrestri t
thatallowforagivenquerytreetobesplitintotwoparts{
theserverpart(referredtoasQ S
)thatexe utesattheserver
rst,andthe lientpart(referredtoasQ C
)thatexe utesat
the lientbasedontheresultsofthequeryevaluatedatthe
server. Ourobje tive willbeto minimizethe omputation
inQ C
. Thatis,wewouldattempttorewritethequerytree,
su hthatmost oftheeortof evaluatingthe queryo urs
attheserver,andthe lientdoesleastamountofwork.
Weillustrateourideasusingexamples.Asarstexample,
onsiderthefollowingquerythat omputesthenamesofthe
managersofthoseemployeesworkingonproje t\diskdrive"
whosesalaryismorethan100K.
SELECT mname
FROM emp, mgr, proj
WHERE proj.pname = `diskdrive'
AND proj.pid = emp.pid
AND emp.sal > 100K
AND emp.did = mgr.did;
Therststepisto onvertthe abovequeryintoa orre-
spondingquery tree, and to manipulate the query tree to
generateagoodplan(usingthestandardqueryrewritelaws
ofrelationalalgebra[13℄). Figure5(a)showsthequerytree
inwhi hthetwosele tionshavebeenpusheddowntorela-
withen ryptedrelationR S
. Theresultingtreeis shownin
Figure5(b).
As it stands, the urrent query tree requires the entire
relations proj S
, emp S
, and mgr S
to be sent to the lient
that will de rypt the relations to evaluate the query. We
nextrepla e thesele tion operationsbytheir implementa-
tionlistedinthepreviousse tionresultinginthequerytree
showninFigure6(a). Noti ethatinthe orrespondingtree,
theserverisparti ipatingintheevaluationofthetwosele -
tion onditions. Sin eourobje tive isto performasmu h
ofthe omputation at the serveras possible, we nextpull
upthetwo lient-sidesele tion onditions
pname=
0
disk drive 0
and
sal=100K
abovethejoinoperator
1
emp:pid=proj:pid using
thestandardrewrite rulesinvolving sele tionsinrelational
algebra[13℄. The newquerytree is shown inFigure6(b).
We annowrewritethequerytreeagainusingthejoinim-
plementation dis ussed in the previous se tion, su h that
1
emp:pid=proj:pid
isexe utedattheserver. Figure6( )shows
thequerytreeaftertherewriting. Finally,we pullthetwo
sele tions
pname=
0
disk drive 0
^sal=100K and
emp:pid=proj:pid
above the join operator
1
emp:did=mgr:did
. Then we repla e
thejoinoperatorbasedontheimplementationdis ussedin
thepreviousse tion,andgetthenalquerytree,asshown
Figure6(d).
Noti ethatinthetreeofFigure6(d),mu hoftheworkof
querypro essingisdoneattheserver. Theresultsobtained
fromtheserverarede ryptedandlteredatthe lient. Our
su essinsplittingthequeryQintotheserver-sideQ S
and
lient-sideQ C
dependedon(1)beingabletopullthesele -
tionoperationsabove otherrelationaloperationshigherin
thequerytree;and(2)repeatedlyrewritingthehigher-level
operationsusingtheoperatorimplementationslistedinthe
previousse tion.
There are situations whenthe sele tion operator annot
bepulledupthequerytreeasitisillustratedinthefollowing
example, whi h uses aset-dieren e operator. Considera
querythatretrievesthe setof employeeswho donotwork
for the manager named \Bob." The orresponding SQL
queryisshownbelow:
SELECT ename FROM emp
WHERE eid NOT IN (
SELECT eid FROM emp, mgr
WHERE eid = mid AND ename = 'Bob');
Using the strategy dis ussed above, we an easily on-
vertthe queryintothequerytreeshowninFigure7(a). If
wearetoexe utethequeryplanillustratedinFigure7(a),
the server will submit to the lient the relation emp S
, as
well asthe en ryptedanswersgenerated bythe 1 S
opera-
tor. Theproje tionsfollowedbytheset-dieren eoperator
will be implemented at the lient. Noti esin e the sele -
tion and proje tion operators annot be pulled above the
set-dieren eoperator,itisdiÆ ulttoapplytheimplemen-
tationoftheset-dieren eoperatordis ussedintheprevious
se tiontoevaluatethesetdieren eattheserver. Thetri k
istorewritetheset-dieren eoperatorusingthe left-outer
joinoperator (similartotheimplementationofsetdier-
en edis ussed inthe previous se tion). Using the rewrite
lawforsetdieren e,the orrespondingtreeismodiedto
thequerytreeshowninFigure 7(b). We annowpullthe
sele tionsandproje tionsabovetheouterjoin,resultingin
D
eid
mgr S
S
Map ond(ename=
0
Bob 0
) ./
S
Map
ond (eid=mid) D
emp S
mid
Q S
emp S
eid=mid^ename=
0
Bob 0
(a)Query tree withjoin
omputationatServer.
D
mid
eid
eid
mgr S
S
Map ond(ename=
0
Bob 0
) ./
S
Map ond(eid=mid) D
emp S
mid eid=mid
Q S
emp S
eid=mid^ename=
0
Bob 0
(b) Rewriting dieren e
operatorusingouterjoin.
mid
eid
eid;mid
D
S
Map ond(ename=
0
Bob 0
) mgr
S D
emp S
eid=mid
Q S
eid=mid^ename=
0
Bob 0
./ S
Map ond(eid=mid)
emp S
( )Afterpullingsele -
tionandproje tion.
mid
eid
emp S
D
S
Map ond(ename=
0
Bob 0
) mgr
S ./
S
Map ond(eid=mid) S
Map ond(eid=mid)
eid=mid^ename=
0
Bob 0
eid;mid
emp S
Q S
(d)Finalquerytree.
Figure7: Queryrewritingforaset-dieren equery.
manipulatedusingtheoperatorimplementationdis ussedin
thepreviousse tion,resultinginthenaltreeshowninFig-
ure 7(d). Thenal treeperformsmu hof thequery om-
putation at the server, and the results are de rypted and
ltered,andthenalanswerisevaluatedatthe lient.
6. EXPERIMENTAL EVALUATION
Wehave ondu tedexperimentstoshowthevalidityand
theee tivenessofthear hite tureproposedinthispaper.
Inthisse tion, wepresentourexperimentalresults.
We ran the tests by utilizing TPC-H ben hmark [14℄.
TPC-H ben hmarkdatabase is reatedats ale fa tor0.01
and0.1,whi harealsoreferredtoasthe10MBand100MB
databaserespe tively. Theexperimentswere ondu tedon
twoIBM Intel-basedpersonal omputers withPentiumIII
700MHzpro essorswith256MBRAM.Oneofthe omput-
ersperformed asthe server,and anotheroneperformed as
the lient a ording to our lient/serverar hite ture. Rel-
evant software omponents used were IBM DB2 v7.1 and
Mi rosoftWindows2000astheoperatingsystem.
Relations: WhiletheTPC-H ben hmarkin ludes mul-
tipletables,ofparti ularinterestinourexperimentalstudy
are thelineitem, ustomer, and order tables. We parti-
tionedthefollowing attributesfor thesetables:
Lineitem : l_shipdate, l_dis ount, l_quantity
Orders : o_orderdate,o_ ustkey,o_shippri ority
Customer : _ ustkey
Partitionshavebeen reatedbasedonthepartitioning ri-
teriades ribedbelow. Toen rypttherowsoftherelations,
weusedtheBlowshen ryptionalgorithm[12℄implemented
depthhistograms [9℄to partitionthe datafor twodierent
lasses of queries. Equi-width and equi-depth histograms
havebeenwidelyusedandinvestigatedinthe ontextofse-
le tivityestimationindatabases[7,8℄. Detaileddes ription
of onstru tingequi-depthhistogramsisgivenin[9℄.
Queries:We onsideredtwodierentqueriesfromTPC-
Hsuitetopresent theevaluationofthedierentaspe tsof
the ar hite ture. The rst query, as shown in Figure 8,
is a sele tion query from a single table, and it is not in-
volvedjoinoperations. These ondquery,as showninFig-
ure9,isamodiedversion ofTPC-Hquerynumber3,de-
noted Q3. This query involves a join operation between
two tables, ustomer and orders. We rst su essfully
rewrotethegivenqueriesusingtherewritingrulesdes ribed
inthepaper,andthenexe utedthetranslatedqueriesinthe
lient/serverar hite turewithdierentpartitionings hemes.
sele tsum(lextendedpri e*ldis ount)asrevenue
fromtp d.lineitem
wherelshipdate>=date('1994-01-01')
andlshipdate<date('1994-01-01')+1year
andldis ountbetween0.06-0.01and0.06+0.01
andlquantity<24
Figure 8: Query used for rst set of experiments,
basedonQ6fromTPC-H ben hmark.
sele toorderdate,oshippriorityfromtp d. ustomer,tp d.orders
where ustkey=o ustkeyandoorderdate<date('1995-03-15')
groupbyoorderdate,oshippriorityorderbyoorderdate
Figure9: Queryusedforse ondsetofexperiments,
basedonQ3fromTPC-H ben hmark.
6.1 Experiment 1
Intherstsetofexperiments,westudiedthe omponents
of the query-exe ution time in our ar hite ture. We on-
du tedthesetestswithin reasingnumberofbu kets. Fig-
ure10(a)(b)showthe resultsofthetests. Itis shownthat
network ommuni ation ostand lient-sitequery-exe ution
timesigni antlyde reasewiththein reaseinthenumber
ofbu kets. Thereason isduetothe de reasingnumberof
rowsreturnedby theserver. Whenthenumberofbu kets
thatpartitionthedatain reases,theserverhasabetter a-
pability to lter out morefalse rows, whi hdonot satisfy
thesele tionpredi ates. Whilenetwork ostand lient-site
query-exe utiontimede reasesharply,itisnotthe asefor
theserver-sitequery-exe utiontime.Intheexperiments,the
sele tivity of the queryis approximately 18%. Be ause of
thepossibilityofprefet hbat hI/O's,doingatables anre-
mainedasthebest hoi eforthedatabaseoptimizer.Hen e,
independentlyfromthe numberofbu kets,predi ateeval-
uationisperformedviaasequentialtables an, ausingthe
steadybehaviorinserver-sitequery-exe utiontime.
Intheseexperimentswealso omparedthequery-exe ution
times in our ar hite turewith the ase of having a single
server,whi hperformsallthefun tionsdes ribedinthear-
hite ture. Theformerrepresentstotaldatapriva y,while
thelatter representsrow-leveldataen ryption/de ryption,
wheretheserveristrustedtode ryptthedata. Figure10(b)
showsthis omparison. Againwepresenttheresultsfordif-
ferentnumbersofbu kets. Therstbarinthegureshows
bles,andperformstherealqueryonthesele tedrows. The
se ond and third bars show the query-exe utiontimes for
theserversideand lientsiderespe tivelywhenthequeryis
exe utedinourar hite ture.
5
Theseexperimentsshowthat
ourar hite turedoesnotintrodu esigni antoverheaddue
totheproposed ommuni ationproto olbetween lientand
server.
6.2 Experiment 2
Inthese ondsetofexperiments,westudiedqueriesthat
in ludejoinoperations. Experimentsarebasedonthemod-
ied version of Q3 (Figure 9) in the TPC-H ben hmark.
Figure11(a)to( )showthe lient-side,server-side,andthe
totalquery-exe utiontimesforin reasingnumberofbu kets
onjoinattributes,namely, ustkeyando ustkeyinthe
query. Thegure illustratesthat queryresponse timesde-
reaseverysharply withthein reasingnumberofbu kets.
As was explained in the previous experiment, this behav-
iorisprimarilysin ewithin reasingnumberofbu ketsthe
serverisbetter abletoeliminatetupleswhi hwouldother-
wisehavetobede ryptedandlteredatthe lient.Theper-
forman eissigni antlyimprovedforboth lientandserver
sidequeries. Althoughthe lient-sidequery-exe utiontime
alsoshowssteepde rease,itis greaterthanthe server-side
query-exe ution time. The reason is due to the dominant
ostof de ryptionperformed atthe lient site. Toexpress
this fa t, Figure 11(b) shows the query response times of
lient-sidequery-exe utiontimewithde ryptionand lient-
sidequery-exe utiontimewithoutde ryption,whi hisplot-
ted by removingde ryption ostfrom thequery-exe ution
time.
Aswasstudiedinourrstsetexperiments,Figure11( )
showsthe totalquery-exe utiontimesfor single-serverand
lient-serverar hite tures.Theresultsofthesetestsarealso
onsistentwiththepreviousones.
7. CONCLUSIONS
Appli ationServi eProvider (ASP)modelfor enterprise
omputinghas emergedwith theriseof Internet te hnolo-
gies. IntheASPmodel,aservi eprovider anprovidesoft-
wareasaservi etoaverylarge lient-baseovertheInternet.
Unlike manyotherservi es, however, databases are spe-
ial. Data is a pre ious resour e of an enterprise. As a
result, priva y andse urity ofdata atthe servi e-provider
site is paramount. In this paper, we addressed a spe i
data-priva y hallenge{whatifthe ownerof thedatabase
doesnottrusttheservi e providerwiththedata? Ourso-
lution isto storethe dataat theservi e provider afteren-
ryptingit,whi h anonlybede ryptedbytheowner. We
havedevelopedte hniquesusingwhi hthebulkofthework
of exe uting the SQL queries an be done by the servi e
providerwithouttheneedto de ryptthestoreddata. The
te hniquedeploysa\ oarseindex",whi hallowspartialex-
e ution ofan SQLquery onthe provider side. Theresult
ofthisqueryissenttothe lient. The orre tresultofthe
queryisfoundbyde ryptingthedata,andexe utinga om-
pensationqueryatthe lientsite. Weproposedte hniqueto
operatetheSQLquery,andsplititintoaserverqueryanda
5
The lient-sitequery-exe utiontimealsoin ludesthenet-
work ommuni ation ostrequiredtotransfersele tedrows
2 8 Number of Buckets
Q u e ry E x e c u ti o n T im e
Client site Network cost Server site
(a) Cost fa tors for the query-
exe utiontime.
2 8
Number of Buckets
Q u e ry E x e c u ti o n T im e
Single server Server site (C/S) Client site (C/S)
(b) Comparison for Client-Server
strategy v.s. the single-server
strategy.
Figure 10: Ee t ofnumberofbu ketsonnonjoin queries.
(a) Client, server, and total
query-exe utiontimes.
(b) Ee t of de ryption per-
formedonthe lientsite.
( )Comparison for lient-server
strategyv.s. single-serverstrat-
egy
Figure 11: Ee t ofnumber ofbu ketsonjoinquery.
lientquery. Theservi eproviderretainstheresponsibility
tomanagethepersisten eofthedata. The lientgetstotal
priva y,andthe ostof ooperatinginqueryexe utionwith
the servi e provider. The lient does not needto manage
datapersisten e,thus ontinuestobenetfromthesystem-
managementservi eofthedatabaseservi eprovider.
8. REFERENCES
[1℄ AES.Advan edEn ryptionStandard.National
Instituteof S ien eandTe hnology, FIPS197,2001.
[2℄ D.SongandD.WagnerandA.Perrig.Sear hon
en rypteddata.InPro .ofIEEE SRSP,2000.
[3℄ DES.DataEn ryptionStandard.FIPSPUB46,
FederalInformationPro essingStandardsPubli ation,
1977.
[4℄ H.Gar ia-Molina,J.Ullman,andJ.Widom.Database
Systems: TheCompleteBook.Prenti eHall,2002.
[5℄ H.Ha gumus,B.Iyer,C.Li,andS.Mehrotra.
Exe utingSQLoverEn ryptedDatain
Database-Servi e-ProviderModel.Te hni alReport
TR-DB-02-02,DatabaseResear hGroupatUniversity
ofCalifornia,Irvine,2002.
[6℄ H.Ha gumus,B.Iyer,andS.Mehrotra.Providing
DatabaseasaServi e.InPro .ofICDE,2002.
[7℄ Y.E.IoannidisandV.Poosala. Histogram-based
approximationofset-valuedqueryanswers.InPro .of
[8℄ H.V.Jagadish,H.Jin,B.C.Ooi,andK.-L.Tan.
Globaloptimizationofhistograms.InPro .ofACM
SIGMOD,2001.
[9℄ G.Piatatetsky-ShapiroandC.Connell.A urate
estimationofthenumberoftuplessatisfyinga
ondition.InPro .ofACMSIGMOD,1984.
[10℄ R.L.Rivest,L.M.Adleman,andM.Dertouzos.On
DataBanksandPriva yHomomorphisms.In
Foundations ofSe ure Computation,pages169{178,
1978.
[11℄ R.L.Rivest,A.Shamir,andL.M.Adleman.A
methodforobtainingdigitalsignaturesandpubli key
ryptosystems.Communi ationsofthe ACM,
21(2):120{126, 1978.
[12℄ B. S hneier.Des riptionofanewvariable-lengthkey,
blo k ipher(blowsh),fastsoftwareen ryption.In
Cambridge Se urityWorkshopPro eedings,1994.
[13℄ A.Silbers hatz,H.F.Korth,andS.Sudarshan.
DatabaseSystemCon epts,3rdEdition.M Graw-Hill
BookCompany,1997.
[14℄ TPC-H.Ben hmarkSpe i ation.
http://www.tp .org.
[15℄ M. WinslettandJ.D.Ullman.JereyD.Ullman
speaksoutonthefutureofhigheredu ation,startups,
databasetheory,andmore.SIGMOD Re ord,30(3),
2001.