• No results found

CiteSeerX — Executing SQL over Encrypted Data in the Database-Service-Provider Model

N/A
N/A
Protected

Academic year: 2022

Share "CiteSeerX — Executing SQL over Encrypted Data in the Database-Service-Provider Model"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Executing SQL over Encrypted Data in the Database-Service-Provider Model

Hakan Hacıg ¨um ¨us¸

x;1

Bala Iyer

2

Chen Li

1

Sharad Mehrotra

1

1

Department of Information and Computer Science, University of California, Irvine, CA 92697, USA [email protected],

f

chenli,sharad

g

@ics.uci.edu

2

IBM Silicon Valley Lab., San Jose, CA 95141, USA, [email protected]

ABSTRACT

Rapidadvan esinnetworkingandInternette hnologieshave

fueledtheemergen eofthe\softwareasaservi e"modelfor

enterprise omputing. Su essfulexamplesof ommer ially

viablesoftwareservi esin luderent-a-spreadsheet,ele troni

mail servi es, general storage servi es, disaster prote tion

servi es. \DatabaseasaServi e"modelprovidesuserspower

to reate,store,modify,andretrievedatafromanywherein

the world, as long as theyhave a ess to the Internet. It

introdu esseveral hallenges,animportantissuebeingdata

priva y. Itisinthis ontextthatwespe i allyaddressthe

issueofdatapriva y.

There are two main priva y issues. First, the owner of

the data needs to be assured that the data stored onthe

servi e-provider site is prote ted against data thefts from

outsiders. Se ond, data needs to be prote ted even from

theservi eproviders,iftheprovidersthemselves annotbe

trusted. In this paper, we fo us onthe se ond hallenge.

Spe i ally, weexplore te hniquesto exe ute SQLqueries

overen rypteddata. Ourstrategyis topro essasmu hof

thequeryaspossible attheservi eproviders'site, without

havingtode ryptthe data. De ryptionandthe remainder

ofthequerypro essingareperformedatthe lientsite. The

paper explores an algebrai framework to split the query

tominimizethe omputation atthe lient site. Resultsof

experimentsvalidatingourapproa harealsopresented.

1. INTRODUCTION

The Internet has made it possible for all omputers to

be onne tedtooneanother. Thein uen eoftransa tion-

pro essing systemsand the Internet ushered in the era of

e-business. The Internet has also had a profound impa t

onthesoftware industry. It hasfa ilitated anopportunity

x

SupportedinpartbyanIBMPh.D.Fellowship. Thiswork

wasperformedwhile theauthorwasatIBM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ACM SIGMOD ’2002 June 4-6, Madison, Wisconsin, USA Copyright 2002 ACM 1-58113-497-5/02/06 ...

$

5.00.

to provide software usage over the Internet, and has led

to anew ategoryof businesses alled \appli ation servi e

providers"orASPs.ASPsprovideworldwide ustomersthe

privilegetousesoftwareovertheInternet. ASPsaresta ed

byexpertsintheartofputtingtogethersoftwaresolutions,

using a variety of software produ ts, for familiar business

servi es su has payroll, enterprise resour e planning, and

ustomer-relationship marketing. ASPso er their servi es

overtheInternettosmallandlargeworldwideorganizations.

Sin e xed ostsareamortizedoveralargenumberofusers,

there isthe potential toredu e the servi e osteven after

possiblyin reasedtele ommuni ationsoverhead.

Itispossibletoprovidestorageand lea essasservi es.

Thenaturalquestionisthefeasibilityofprovidingthenext

value-addlayerindatamanagement.Fromthebusinessper-

spe tive,databaseasaservi einheritsalltheadvantagesof

theASPmodel, indeedevenmore,giventhatalargenum-

ber of organizations have their own DBMSs. The model

allows organizationsto leverage hardwareand software so-

lutions provided by the servi e providers, without having

to develop themontheir own. Perhaps moreimportantly,

it provides a way for organizations to share the expertise

of database professionals, thereby utting the people ost

ofmanaginga omplexinformationinfrastru ture,whi his

important both for industrial and a ademi organizations

[15℄.

Fromthete hnologi alangle, themodelposesmanysig-

ni ant hallengesforemostofwhi histheissueofdatapri-

va y and se urity. Inthe database-servi e-provider model,

user data resides on the premises of the database-servi e

provider. Most orporationsviewtheirdataasaveryvalu-

ableasset. Theservi eproviderwouldneedtoprovidesuÆ-

ientse uritymeasurestoguarddatapriva y. Atleasttwo

data-priva y hallenges arise. The rst hallenge is: how

do servi e providers prote t themselves from theft of us-

tomerdatafromha kersthatbreakintotheirsiteands an

disks? En ryptionofstoreddataisthestraightforwardso-

lution, but not without hallenges. Trade-o s needto be

maderegardingen ryption te hniquesandthedatagranu-

larityfor en ryption. This rst hallengewasexaminedby

Ha gumus etal. [6℄. It wasfound that hardware en ryp-

tionissuperiortosoftware en ryption. En ryptingdatain

bulkredu edtheper-byteen ryption ostsigni antly,ex-

posingtostartupoverheads. En ryptingbyrowwas found

preferabletoen ryptingby eldforqueriesfromtheTPC-H

ben hmark[14℄.

(2)

ismore omplexsin eitin ludesprote tionfromthedatabase

provider. Therequirementisthat en rypteddatamaynot

be de rypted at the provider site. A straightforward ap-

proa histotransmittherequisiteen ryptedtablesfromthe

server(attheprovidersite)tothe lient,de ryptthetables,

andexe utethequeryatthe lient. Butthisapproa hmiti-

gatesalmosteveryadvantageoftheservi e-providermodel,

sin e now primary data pro essing has to o ur on lient

ma hines. It willbe ome lear later,for alargenumberof

queries su h as sele tions, joins, and unions, mu h of the

datapro essing anbedoneattheserver,andtheanswers

anbe omputedwithlittlee ortbythe lient.

Ourproposed system,whosebasi ar hite tureand on-

trol oware showninFigure1, is omprised ofthreefun-

damental entities. A user poses the query to the lient.

A server is hosted by the servi e provider who stores the

en rypteddatabase. Theen rypteddatabaseisaugmented

withadditionalinformation(whi hwe alltheindex)allows

ertain amount of querypro essing to o ur at the server

withoutjeopardizing datapriva y. A lientstoresthe data

at the server. Client 1

also maintains metadata for trans-

latinguserqueriestotheappropriaterepresentationonthe

server,andperformspost-pro essingonserverqueryresults.

Basedontheauxiliaryinformationstored,wedevelopte h-

niquestosplitanoriginalqueryoverunen ryptedrelations

into (1)a orrespondingqueryoveren ryptedrelations to

runontheserver,and(2)a lientqueryforpost-pro essing

results of the server query. We a hieve this goal by de-

veloping an algebrai framework for query rewriting over

en ryptedrepresentation. Finally,weexplorethefeasibility

ande e tivenessofourapproa hbytestingtheperforman e

ofourstrategyovernumerousqueries.Ourresultsshowthat

priva yfromservi eproviders anbea hievedwithreason-

ableoverheadestablishingthefeasibilityofthemodel.

There is previous work in di erent resear h areas some

ofwhi harerelatedtoourwork. Sear honen rypteddata

[2℄,whereonlykeywordsear hissupported,anddoingarith-

meti overen rypteddata[10℄havebeenstudiedinthelit-

erature. Howeverfun tionalitiesprovidedbythosearevery

limited and insuÆ ient in exe uting omplexSQL queries

overen rypteddata.

The rest ofthe paperis organized as follows. Se tion2

presents how data is en rypted and stored onthe server.

Se tion3dis usseshowa onditioninaqueryistranslated

toa onditionontheen rypteddataattheserver. InSe -

tion4wedes ribehowindividual relationaloperators su h

as sele tion, join, set di eren e, and group by are imple-

mented. Se tion5showshowtorewriteaquerybysplitting

itintoaserverqueryanda lientquery,su hthatthe om-

putationatthe lientisredu ed.Se tion6givesourexper-

imentalresultsonqueriesfromtheTPC-Hben hmark.We

on ludethepaperinSe tion7.

2. RELATION ENCRYPTION AND STOR- AGE MODEL

Beforewedis usste hniquesforquerypro essingoveren-

rypteddata,letus rstdis usshowtheen rypteddatais

storedattheserver.

Forea hrelationR (A1;A2;:::;An),westoreontheserver

1

Figure 1: The servi e-provider ar hite ture.

anen ryptedrelation:

R S

(etupl e;A S

1

;A S

2

;:::;A S

n )

where the attributeetupl e (We will explain how etupl e is

de nedinSe tion2.4.) storesanen ryptedstringthat or-

respondstoatupleinrelationR . 2

Ea hattributeA S

i or-

respondstotheindexfortheattributeA

i

thatwillbeused

for querypro essingattheserver. For example, onsidera

relationempbelowthatstoresinformationaboutemployees.

eid ename salary addr did

23 Tom 70K Maple 40

860 Mary 60K Main 80

320 John 50K River 50

875 Jerry 55K Hopewell 110

Theemptableismappedtoa orrespondingtableatthe

server:

emp S

(etupl e;eid S

;ename S

;sal ary S

;addr S

;did S

)

Itisonlyne essaryto reateanindexforattributesinvolve

insear handjoinpredi ates. Inthe above example, ifwe

knew thattherewouldbenoquerythat involvesattribute

addr ineitherasele tion ora join, thenthe indexonthis

attribute neednot be reated. Without loss of generality,

we assume that anindex is reatedoverea hattribute of

therelation.

2.1 Partition Functions

WeexplainwhatisstoredinattributeA S

i ofR

S

forea h

attributeAiofR . Forthispurpose,wewillneedtodevelop

some notations. We rst map the domainof values (D

i )

ofattributeR :Ai intopartitionsfp1;:::;p

k

g,su hthat(1)

thesepartitionstakentogether overthewholedomain;and

(2)any two partitionsdonotoverlap. Formally, wede ne

afun tionpartitionasfollows:

partition(R :A

i

)=fp1;p2;:::;p

k g

Asanexample, onsidertheattributeeidoftheemptable

above. Supposethevaluesofdomainofthisattributeliein

therange[0;1000℄. Assumethatthewholerangeisdivided

2

Notethatwe ouldalternativelyhave hosentoen ryptat

theattributelevelinsteadoftherowlevel.Ea halternative

hasitsownprosand ons. Wepointtheinterestedreadersto

[6℄foradetaileddes ription. Therestofthispaperassumes

(3)

into5partitions : [0;200℄, (200;400℄, (400;600℄, (600;800℄,

and(800;1000℄. Thatis:

partition(emp:eid)=

f[0;200℄;(200;400℄;(400;600℄;(600;800℄;(800;1000℄g

Di erentattributesmaybepartitionedusingdi erentpar-

titionfun tions. Itshouldbe learthatthepartitionofat-

tributeAi orresponds to a splitting of itsdomain into a

setofbu kets. Anyhistogram- onstru tionte hnique,su h

as MaxDi , equi-width, or equi-depth [9℄, ould be used

to reate partitioning of attributes. In the examplesused

to explain our strategy, for simpli ity, we will assume the

equi-widthpartitioning. Extensionofourstrategytoother

partitioningmethodsisrelativelystraightforward,thoughit

willrequire hangestosomeofthenotationsdeveloped.For

example,unlikeequi-width asewhereavaluemapstoonly

asinglehistogrambin,inequi-depthitmay maptomulti-

plebu kets. Ournotationassumesthatea hvaluemapsto

asinglebu ket. Inthe experimentalse tion, besidesusing

theequi-widthwewillalsoevaluateourstrategyunderthe

equi-depthpartitioning.

Intheaboveexample,anequi-widthhistogramwasillus-

trated. Notethat whenthe domainof anattribute orre-

sponds to a eld overwhi h orderingis well de ned(e.g.,

the eid attribute), we will assume that a partition pi is a

ontinuousrange. Weusep

i

:l owandp

i

:hightodenotethe

lowerandupperboundaryofthepartition,respe tively.

2.2 Identification Functions

Furthermore, we de ne anidenti ation fun tion alled

identtoassignanidenti er ident

R:A

i (p

j

)toea hpartition

p

j

of attributeA

i

. Figure2 shows the identi ersassigned

tothe 5 partitions ofthe attributeemp:eid. For instan e,

identemp:eid([0;200℄)=2,andidentemp:eid((800;1000℄)=4.

0 200 400 600 1000

800

2 7 5 1 4

Figure 2: Partition and identi ation fun tions of

emp:eid.

Theidentfun tionvalueforapartitionisunique,thatis,

identR:A

i

(pj) 6= identR:A

i

(pl), if j 6= l . For this purpose,

a ollision-freehashfun tionthat utilizesproperties ofthe

partitionmaybeusedasanidentfun tion. Forexample,in

the asewhere apartition orrespondstoanumeri range,

thehashfun tionmayusethestart and/orendvaluesofa

range.

2.3 Mapping Functions

Giventheabovepartitionandidenti ationfun tions,we

de neamappingfun tionMapR:A

i

thatmapsavaluevin

thedomainofattributeA

i

totheidenti erofthepartition

towhi hvbelongs: Map

R:A

i

(v)=ident

R:A

i (p

j

),where p

j

isthepartitionthat ontainsv.

In the example above, the following table shows some

values of themapping fun tionfor attribute emp:eid. For

instan e, Map

emp:eid

(23) = 2, Map

emp:eid

(860) = 4, and

Mapemp:eid(875)=4.

3

Notethatitisnotne essaryto reateallofthepartitions

at the beginning. They an be reated as the values are

eidvaluev 23 860 320 875

Map

emp:eid

(v) 2 4 7 4

Wefurther lassifytwotypesofmappingfun tions:

1. Order preserving: A mapping fun tion MapR:A

i is

alledorder preservingifforanytwovaluesv

i andv

j

inthedomainof A

i ,if v

i

<v

j

,thenMap

R:Ai (v

i )

MapR:A

i (vj).

2. Random: Amappingfun tionis alledrandomifitis

notorderpreserving.

A random mapping fun tion provides superior priva y

ompared to its orresponding order-preserving mapping.

However, aswewill seelater, whether amappingfun tion

is orderpreservingor nota e tshowwe translateaquery

into queries onthe lient and server. Query translation is

simpli ed usinga order-preserving mapping fun tion. We

will develop translation strategies for both types of map-

pingfun tions.

Wefurtherde nethreemoremappingfun tionsthatwill

help usintranslating queriesovertheen ryptedrepresen-

tation. Whilethe rstfun tionde ned holdsoverany at-

tribute,thelattertwoholdfortheattributeswhosedomain

valuesexhibittotalorder. Appli ationofthemappingfun -

tiontoavaluev,greaterthanthemaximumvalueinthedo-

main,vmax,returnsMapR:A

i

(vmax). Similarly,appli ation

ofthemappingfun tiontoavaluev,lessthantheminimum

valueinthedomain,v

mim

,returnsMap

R:A

i (v

min ).

LetSbeasubsetofvaluesinthedomainofattributeAi,

and v be a value inthe domain. We de ne the following

mappingfun tionsonthepartitionsasso iatedwithA

i :

Map

R:A

i

(S)=fident

R:A

i (p

j )jp

j

\S6=;g

Map

>

R:A

i

(v)=fident

R:A

i (p

j )jp

j

:l owvg

Map

<

R:A

i

(v)=fidentR:A

i

(pj)jpj:highvg

Essentially, Map

R:A

i

(S) isthe setofidenti ers ofparti-

tions whoseranges mayoverlapwiththevaluesinS. The

resultofMap

>

R:A

i

(v)isthesetofidenti ers orrespondingto

partitionswhoserangesmay ontainavaluenotlessthanv.

Likewise,Map

<

R:A

i

(v)isthesetofidenti ers orresponding

topartitions whoserangesmay ontainavalue notgreater

thanv.

2.4 Storing Encrypted Data

We now have enough notations to spe ify how to store

the en rypted relation R S

on the server. For ea h tuple

t=ha1;a2;:::;aniinR ,therelationR S

storesatuple:

hen rypt(fa

1

;a

2

;:::;a

n g);Map

R:A1 (a

1 );

MapR:A

2

(a2);:::;MapR:A

n (an)i

where en rypt is the fun tion used to en rypt a tuple of

the relation. For instan e, the following is the en rypted

relationemp S

storedontheserver:

etuple eid

S

ename S

salary S

addr S

did S

1100110011110010... 2 19 81 18 2

1000000000011101... 4 31 59 41 4

1111101000010001... 7 7 7 22 2

1010101010111110... 4 71 49 22 4

The rst olumnetupl e ontainsthestring orresponding

to theen ryptedtuples inemp. For instan e,the rsttu-

(4)

to\1000000000011101..."equaltoen rypt(860;Mary;60K ;

Main;80). Wetreattheen ryptionfun tionasabla kbox

inourdis ussion. Anyblo k ipherte hniquesu hasAES

[1℄, RSA [11℄, Blow sh [12℄, DES[3℄ et ., an be used to

en ryptthetuples.

These ond olumn orresponds totheindexontheem-

ployeeids. Forexample, valueforattributeeid inthe rst

tupleis23,andits orrespondingpartitionis[0;200℄. Sin e

this partition is identi ed to 2, we store the value \2" as

the identi er of the eid for this tuple. Similarly,we store

the identi er \4" for the se ond employee id 860. In the

tableabove,weusedi erentmappingfun tionsfordi erent

attributes. Themapping fun tions forthe ename,sal ary,

addr, and did attributes are not shown, but they are as-

sumedtogeneratetheidenti erslistedinthetable.

Ingeneral,weusethenotation\E"(\En rypt")tomap

arelationRtoitsen ryptedrepresentation. Thatis,given

relationR (A1;A2;:::;An),relationE(R )isR S

(etupl e;A S

1

;

A S

2

;:::; A S

n

). Inthe above example, E(emp)is thetable

emp S

.

2.5 Decryption Functions

Given the operator E that maps a relation to its en-

ryptedrepresentation,wede neitsinverseoperatorDthat

mapstheen ryptedrepresentationtoits orrespondingun-

en ryptedrepresentation. Thatis,D(R S

)=R . Intheex-

ampleabove,D(emp S

)=emp.TheDoperatormayalsobe

appliedonquery expressions. A queryexpression onsists

ofmultipletables related by arbitraryrelational operators

(e.g.,joins,sele tions,et .)

As it will be lear later, the general s hema of an en-

ryptedrelationortheresultofrelationaloperatorsamongst

en ryptedrelations,R S

i is:

hR S

1

:etupl e;R S

2

:etupl e;:::;

R S

1 :A

S

1

;R S

1 :A

S

2

;:::;R S

2 :A

S

1

;R S

2 :A

S

2

;:::i

Whenthede ryptionoperatorDisappliedtoR S

i

,itstrips

o theindexvalues(R S

1 :A

S

1

;R S

1 :A

S

2

;:::;R S

2 :A

S

1

;R S

2 :A

S

2

;:::)

andde rypts(R S

1

:etupl e;R S

2

:etupl e;:::)totheirunen rypted

attributevalues.

Asanexample,assumethatanothertablede nedasmgr

(mid; did) was also stored in the database. The orre-

spondingen rypted representation E(mgr) will be atable

mgr S

(etupl e; mid S

; did S

). Suppose we were to ompute

a join between tables emp S

and mgr S

on their did S

at-

tributes. Theresultingrelation temp S

will ontaintheat-

tributeshemp S

:etupl e;eid S

;ename S

;sal ary S

;addr S

;emp S

:

did S

; mgr S

:etupl e; mid S

; mgr S

:did S

i. If we are to de-

ryptthetemp S

relationusingthe Doperator to ompute

D(temp S

),the orrespondingtablewill ontaintheattributes

(eid;ename;sal ary;addr;emp:did;mid;mgr:did)

Thatis,D(temp S

)willde ryptalloftheen rypted olumns

intemp S

anddroptheauxiliary olumns orrespondingto

theindi es.

3. MAPPING CONDITIONS

Map ond

In this se tion we studyhow to translate spe i query

onditions in operations (su h as sele tions and joins) to

orresponding onditionsovertheserver-siderepresentation.

Thistranslationfun tionis alledMap ond. On eweknow

howrelationaloperators aretranslatedovertheserver-side

implementation,andhowquerytreesaretranslated.

Forea hrelation,theserversidestorestheen ryptedtu-

ples, along with the attribute indi esdetermined by their

mapping fun tions. Meanwhile, the lient storesthe meta

dataaboutthespe i indi es,su hastheinformationabout

the partitioning of attributes, the mapping fun tions, et .

The lientutilizesthisinformationtotranslateagivenquery

Q to its server-side representation Q S

, whi h is then exe-

uted by theserver. We onsider query onditions hara -

terizedbythefollowinggrammarrules:

Condition AttributeopValue;

Condition AttributeopAttribute;

Condition (Condition_Condition)j(Condition^Con-

dition)j(:Condition);.

Allowedoperationsforopin ludef=;<;>;;g.

Inthedis ussionbelowwewillusethefollowingtablesto

illustratethetranslation.

emp(eid, ename, salary, addr, did, pid)

mgr(mid, did, mname)

proj(pid, pname, did, budget)

Attribute = Value: Su ha onditionarises insele tion

operations. Themappingisde nedasfollows:

Map ond(Ai=v))A S

i

=MapA

i (v)

As de ned in Se tion2.3, fun tion MapA

i

mapsv to the

identi er of A

i

's partitionthat ontains withvalue v. For

instan e, onsidertheemptableabove,wehave:

Map

ond

(eid=860))eid S

=4

sin eeid=860ismappedto4bythemappingfun tion of

thisattribute.

Attribute < Value: Su h a ondition arises insele tion

operations. Theattribute musthave awellde ned order-

ingoverwhi hthe\<"operatorisde ned.Dependingupon

whetherornotthemappingfun tionMapA

i

oftheattribute

isorder-preservingorrandom,di erenttranslationsarepos-

sible.

4

Orderpreserving:Inthis ase,thetranslationisstraight-

forward:

Map ond(Ai<v))A S

i

MapA

i (v)

Random: Thetranslationisalittle omplex. We he k

if the attribute value representation A S

i

lies inany of the

partitions that may ontaina value v 0

where v 0

<v. For-

mally,thetranslationis:

Map ond(Ai<v))A S

i 2Map

<

A

i (v)

Forinstan e,thefollowing onditionistranslated:

Map ond(eid<280))eid S

2f2;7g

sin e all employee ids less than 280 have two partitions

[0;200℄and(200;400℄,whoseidenti ersaref2;7g.

4

Note that we an always use the mapping de ned in

the random ase to translate onditions involving order-

preserving attributes. We di erentiate between the two

asessin ethetranslation(aswellasthequerypro essing)

iseasierfortheformer ase.

(5)

previousone. Asbeforewedi erentiatewhetherornotthe

mappingfun tionisorderpreserving. Thetranslationisas

follows:

Orderpreserving:Map

ond (A

i

>v))A S

i

Map

A

i (v);

Random:Map

ond

(Ai>v))A S

i 2Map

>

A

i (v).

Forinstan e,thefollowing onditionistranslated:

Map

ond

(eid>650))eid S

2f1;4g

sin eallemployeeidsgreaterthan650aremappedtoiden-

ti ers: f1;4g.

Attribute1 = Attribute2: Su ha ondition might arise

inajoin. Thetwo attributes anbefromtwodi erentta-

bles,orfromtwoinstan esofthesametable.The ondition

analsoariseinasele tion, andthetwoattributes anbe

fromthesametable. Thefollowing isthetranslation:

Map ond(Ai=Aj))

W

' A

S

i

=ident

A

i (p

k )^A

S

j

=ident

A

j (p

l )



where'ispk2partition(Ai);pl2partition(Aj);pk\pl6=

;. That is, we onsider all possible pairs of partitions of

A

i andA

j

that overlap. For ea h pair(p

k

;p

l

), we havea

ondition onthe identi ers of thesetwo partitions: A S

i

=

ident

A

i (p

k )^A

S

j

= ident

A

j (p

l

). Finally we take the dis-

jun tionofthese onditions. Theintuitionisthatea hpair

ofpartitionsmayprovidesomevaluesofAiandAjthat an

satisfythe onditionA

i

=A

j .

Partitions Ident

emp:did

Partitions Ident

mgr:did

[0,100℄ 2 [0,200℄ 9

(100,200℄ 4 (200,400℄ 8

(200,300℄ 3

(300,400℄ 1

Forinstan e,thetableaboveshowsthepartitionandiden-

ti ationfun tionsoftwoattributes emp:didand mgr:did.

Then onditionemp:did=mgr:didistranslatedtothefol-

lowing onditionC1:

C

1

: (emp

S

:did S

=2^mgr S

:did S

=9)

_ (emp S

:did S

=4^mgr S

:did S

=9)

_ (emp S

:did S

=3^mgr S

:did S

=8)

_ (emp S

:did S

=1^mgr S

:did S

=8).

Attribute1 <Attribute2: Againsu ha onditionmight

ariseineitherajoinor inasele tion. Let us assumethat

the onditionisA

i

<A

j

. Justas intranslating onditions

with inequality operator seen previously, the mapping of

the ondition depends upon whether or not the mapping

fun tionsof theattributes A

i and A

j

areorderpreserving

orrandom. Wespe ifythetranslationforea hinturn.

MapA

j

isorderpreserving: Insu ha asewelistoutall

thepartitionsofAiandidentifyallthepartitionsofAjthat

satisfy theordering ondition. Spe i ally,the mappingis

asfollows:

Map

ond (A

i

<A

j ))

W

p2partition(A

i )

A S

i

=identA

i (p)^A

S

j

MapA

j (p:l ow)



Map

A

i

isorderpreserving: IfA

i

isorderpreserving,we

Map ond(Ai<Aj))

W

p2partition(A

j )

A S

j

=identA

j (p)^A

S

i

MapA

i (p:high)



BothMapA

i

andMapA

j

are orderpreserving: Inthis

asewehavea hoi eofusingeitheroftheabovetwomap-

pings. Our hoi eisbasedonthespe i partitioningofA

i

andAj. We andothetranslationasfollows:

Map

ond (A

i

<A

j ))

W

' MapA

i (p

k

:l ow)MapA

j (p

l :high)



where'ispk2partition(Ai);pl2partition(Aj).

BothMapA

i

andMapA

j

arerandom: Wehavethefol-

lowing translation:

Map

ond

(Ai<Aj))

W

' A

S

i

=identA

i

(pk)^A S

j

=identA

j (pl)



where'isp

k

2partition(Ai);p

l

2partition(Aj);p

l :high

pk:l ow. Thatis,we onsiderallpairsofpartitionsofAiand

A

j

that ouldsatisfythe ondition. Forea hpair,wehavea

ondition orrespondingto thepairoftheiridenti ers. We

takethedisjun tionofthese onditions.

Forexample, ondition C

2

:emp:did<mgr:didistrans-

latedto:

C2: (emp

S

:did S

=2^mgr S

:did S

=9)

_ (emp S

:did S

=2^mgr S

:did S

=8)

_ (emp S

:did S

=4^mgr S

:did S

=9)

_ (emp S

:did S

=4^mgr S

:did S

=8)

_ (emp S

:did S

=3^mgr S

:did S

=8)

_ (emp S

:did S

=1^mgr S

:did S

=8).

Conditionemp S

:did S

= 4^mgr S

:did S

= 9is in luded,

sin epartition(100;200℄forattributeemp:didandpartition

(200;400℄ for attributemgr:did anprovide pairsofvalues

thatsatisfyemp:did<mgr:did.

For onditionAttribute1>Attribute2,theMap

ond map-

pingissameasthemappingofAttribute2<Attribute1,as

des ribedabovewiththerolesoftheattributesreversed.

Condition1 _Condition2,Condition1 ^Condition2:

Thetranslationofthetwo omposite onditionsisgivenas

follows:

Map ond(Condition1_Condition2))

Map

ond

(Condition1)_Map

ond

(Condition2)

Map ond(Condition1^Condition2))

Map

ond

(Condition1)^Map

ond

(Condition2)

TranslationofMap ond(:Condition)treatmentismorein-

volved sin e negated queries are not monotoni and their

orre t translationrequires morenotation. Thisdis ussion

anbefoundin[5℄.

Operatorfollows thesamemappingas<andoperator

follows thesamemappingas>. Conditionsthatinvolve

(6)

4. IMPLEMENTING RELATIONAL OPER- ATORS OVER ENCRYPTED RELATIONS

Inthisse tionwedes ribehowindividualrelationaloper-

ators(su hassele tions, joins,setdi eren e,andgrouping

operators) anbeimplementedintheproposeddatabasear-

hite ture. Ourstrategyistopartitionthe omputationof

theoperators a rossthe lientand theserver. Spe i ally,

we will attempt to ompute a superset of answers gener-

ated by the operator using the attributeindi es stored at

theserver. These answerswillthenbe lteredatthe lient

after de ryption to generate the true results. We will at-

temptto minimizethe workdone at the lient as mu has

possible. Furthermore, we will try to ensure as mu h as

possiblethatoperatorsexe utedonthe lientsidearesu h

thatthey anbeappliedtothetuplesarrivingoverthean-

swerstreamassoonastheyarrive(withoutaneedtostore

them). Thepurposeistoguaranteethatthe lient-sideop-

erators anbeeÆ ientlyimplemented. Theimplementation

of operators developed inthis se tion will be used in the

followingse tion,wherewedevelopanalgebrai framework

for rewriting SQL queries for the purpose of splitting the

query omputationa rossthe lientandtheserver.

For explaining the implementation of operators, we will

onsider the following two simpli ed relations of those in

thepreviousse tion:

emp(eid, did), mgr(mid, did)

Inthepreviousse tionswehavegiventheMapfun tionsof

emp:eid,emp:did,and mgr:did. For simpli ity,weassume

that the Map fun tionof mgr:mid is the same as that of

emp:eid,asshowninFigure3. Inaddition,weuseRandT

todenote tworelations, and use theoperator notations in

[4℄.

0 200 400 600 800 1000

2 7 5 1 4 2 4 3 1

100 200 300 400 0

0 200 400 600 800 1000

2 7 5 1 4

emp:eid emp:did

mgr:mid

200 400

0

9 8

mgr:did

Figure 3: Partition and identi ation fun tions for

fourattributes.

TheSele tionOperator (): Considerasele tionoper-

ation

C

(R )onarelationR ,whereC isa onditionspe i-

edononeormoreoftheattributesA1;A2;:::;AnofR . A

straightforward implementationofsu hanoperatorinour

environment istotransmittherelationR S

fromtheserver

tothe lient. Thenthe lientde ryptsthe resultusingthe

D operator, and implementsthe sele tion. This strategy,

however,pushestheentireworkofimplementingthesele -

tiontothe lient. Inaddition,theentireen ryptedrelation

needsto be transmittedfrom the serverto the lient. An

alternativeme hanismistopartially omputethesele tion

operatorattheserverusingtheindi esasso iatedwiththe

attributesinC,andpushtheresultstothe lient. The lient

de ryptstheresultsand ltersouttuplesthatdonotsatisfy

C. Spe i ally,theoperator anberewrittenasfollows:



C (R )=

C



D(

S

Map (C) R

S

)





Intheabovenotation,weadorntheoperator thatex-

e utes at the server with a supers ript \S" to highlight

the fa t that the sele t operator exe utes at the server.

All non-adorned operators are assumed to exe ute at the

lient. The de ryption operator D will only keep the at-

tributeetupl eofR S

,and dropall theotherA S

i

attributes.

We explain the above implementation using an example



eid<395^did=140

(emp).Basedonthede nitionofMap

ond (C)

dis ussedinthepreviousse tion,theabovesele tionopera-

tionwillbetranslatedintoC



D(

S

C 0 emp

S

)





,wherethe

onditionC 0

ontheserveris:

C 0

=Map ond(C)= eid S

2[2;7℄^did S

=4



TheJoinOperator(1):ConsiderajoinoperationR 1

C S.

Thejoin onditionC ouldbeeitherequality onditions(in

whi h ase the join orresponds to an equijoin), or ould

be more general onditions (resultingintheta-joins). The

abovejoinoperation anbeimplementedasfollows:

R 1

C

T =C



D R S

1 S

Map

ond (C)

T S





Asbefore, theS adornmentonthejoinoperatorempha-

sizes thefa tthat thejoinis tobe exe utedatthe server.

For instan e, join operation emp

1

emp:did=mgr:did

mgr is

translatedto:

C



D emp S

1 S

C 0

mgr S





wherethe onditionC 0

ontheserveris onditionC

1 de ned

inSe tion3.

The Grouping and Aggregation Operator ( ): A

grouping and aggregation operation is denoted by

L (R ),

where L = LG[LA. LG refers to a list of attributes on

whi h the grouping is performed, and LA orresponds to

a set of aggregation operations. As an example, the op-

eration

C;COUNT(B)!F

(R ) means that we reate groups

using attribute C of relation R , and for ea h group om-

putethe ount(B)fun tion. Thatis,L

G

=fCg,andL

A

=

fCOUNT(B)!Fg. Theresultingrelationwill ontaintwo

attributesCandF.Atupleintheresultwillhaveanentry

forea hdistin tvalueofC,andthenumberoftuplesinthe

groupreportedasattributeF. IfLA=;,onlygroupingis

performed.

Implementation of the grouping operator

L

(R ) an be

a hievedasfollows:

L (R )=

L



D S

L 0(R

S

)





;whereL 0

=fA S

i jA

i 2L

G g

Thatis,theserverwill grouptheen ryptedtuplesbased

onthe attributes ofL

G

. The serverdoesnot performany

aggregation orrespondingtoLA,sin eitdoesnothaveany

values for those attributes inLA. The results of S

L 0 are

returnedtothe lient,whi hperformsthe groupingopera-

tion L. Thisoperation anbeimplementedveryeÆ iently,

sin e everytuplebelonging toasingle groupof L will be

inasingle S

L 0

group omputedby theserver. Asaresult,

the lientonlyneedsto onsidertuplesinasingle S

L 0

group

when omputingthegroups orrespondingto L. Of ourse,

theaggregation fun tionsspe i edinL

A

will be omputed

atthe lient,sin etheir omputationrequiresthattuplesbe

rstde rypted.

(7)

did;COUNT(eid)!F (emp)

Thatis, wewant to ndthe numberof employeesinea h

department. LetL denote\did;COUNT(eid)!F." The

operationistranslatedto:

L



D S

did S

(emp S

)





Thatis,we rstdoagroupingonthedid S

attributeonthe

server. Afterthegroupedtuplesarereturnedtothe lient,

wede ryptthedata,andperformthegroupingoperationon

thedidattribute. Thisstep anbedoneeÆ iently,sin eall

thetupleswiththesamedidhavealreadybeengroupedby

theserver. Finally,we performtheaggregation ount(eid)

to ountthenumberofemployeeidsforea hdid.

The Sorting Operator (): A sorting operation L(R )

anbeimplementedsimilarlytothegroupingoperator.That

is,we rstsortonpartitionidsattheserver. Thestrategy

toimplementL(R )isasfollows:

L(R )=L



D S

L 0(R

S

)





whereL 0

=listofA S

i

orrespondingtothe A

i

inthelistL

ofattributes.

Thatis,wedoagroupingoperation S

L 0

ontheen rypted

attributes L 0

of those in L. If the mapping fun tions of

the attributes inL are all orderpreserving, this grouping

S

L

0 operation an be repla ed by a orresponding sorting

operation S

L 0

.Aftertheresultsarereturnedtothe lient,we

allthede ryptionfun tionD,andperformtheLoperation

bysortingthetuplesonattributesL.

Notethattheamountofworkdoneatthe lientto om-

pute

L

inpostpro essingdependsuponwhetherornotthe

attributeslistedinLhaveorder-preservingmappings.Ifthe

attributeshaveorder-preservingmappings,thentheresults

returnedby the server are presorted upto within a parti-

tion. Thus, sorting the results is a simplelo al operation

overa singlepartition. Alternatively,evenifthe mapping

is not order preserving, it is useful to ompute S

at the

servertoredu etheamountof lientwork. Sin ethetuples

have been grouped by the server, L an be implemented

eÆ ientlyusingamerge-sortalgorithm.

Forexample, thesorting operation

eid

(emp) anbeim-

plementedasfollows:

eid



D S

eid (emp

S

)





whereL=feidg. Thatis, we rstperformagroupingop-

eration

eid

ontheemp S

relationontheserver. The lient

de ryptsthereturnedtuples,andappliesthesortingopera-

tion

emp .

TheDupli ate-EliminationOperator(Æ):Thedupli ate-

eliminationoperatorÆisimplementedsimilarlytothegroup-

ingoperator:

Æ(R )=Æ



D L(R S

)





whereL=listofallattributesA S

i

whereAiisanattribute

inR .

Thatis,we rstgrouptheen ryptedtuplesontheserver

S

ination operation Æ. For example, the operationÆ(emp)is

translatedto:

Æ



D

eid S

;did S

(emp S

)





The Set Di eren e Operator ( ): Implementationof

thedi eren eoperationR T attheserverisdiÆ ultsin e,

without rstde ryptingtherelations RandT,itisimpos-

sibletotellwhether ornotagiventupleofRalsoappears

inS. However,the indi esstoredat theserver anstill be

usedtomeaningfullyredu etheamountofworkdoneatthe

lient. In thefollowing we assumethat relations R and T

aresetdi eren e ompatibleandarede nedoverattributes

A1;A2;:::;AnandB1;B2;:::;Bnrespe tively. Thefollow-

ingrule anbeusedtoimplementthesetdi eren eoperator:

R T =R:A

1

;:::;R:An (R

0

) T:B

1

;:::;T:Bn (R

0

)

R 0

=D S

L (R

S S

Map

ond

^

i=1;:::;n (R:A

i

=T:B

i )



T S

)



whereL=fA S

1

;A S

2

;:::;A S

n

;B S

1

;B S

2

;:::;B S

n g.

On e again, the symbol S as a supers ript of the left-

outerjoin emphasizes(denoted )thattheoperatorisim-

plementedonthe serverside. Weillustratethe above rule

throughanexample. Supposewewant to omputeemp

mgr,thatis,wewantto ndalltheemployeeswhoarenot

managers. Thequeryistranslatedtothefollowingquery:

emp:eid;emp:did(R 0

) mgr:mid;mgr:did(R 0

)

R 0

=D S

eid S

;emp S

:did S

;mid S

;mgr S

:did S

(emp

S S

C 0 mgr

S

)



The ondition C 0

is: Map

ond

(emp:eid =mgr:mid)^C

1 ,

where C

1

is de ned inSe tion 3. (See Attribute1 = At-

tribute2 ase)

Afewobservationsabouttheaboveimplementationofthe

set-di eren eoperator arenoteworthy. First,the grouping

of the results basedonindex attributesis not ne essary{

that is, the translation would be orre t evenwithout the

grouping operator. Thereason for in luding the grouping

operatoristhatit ansigni antlyredu ethe omputation

onthe lient. Forexample, dueto thegrouping operator,

all the tuples that have a NULL value for T S

attributes

will begroupedtogether. Whentheresultingtuples ofthe

setdi eren e operatorarriveat the lient, su htuples an

bede ryptedandthe orrespondingRtupleimmediatelyre-

turnedasananswer. Thereasonisthattherearenomat h-

ingtuplesofT that ould ausethepotentialeliminationof

thesetuplesofR . Hen e,theproje tionandthesubsequent

setdi eren eimplementedonthe lientsidemayonlybere-

stri tedtothosetuplesforwhi hthe orrespondingT value

isnotNULL.

Furthermore,in omputingtheproje tiontotheattributes

ofRandSandthesubsequentsetdi eren ebetweenthetwo

proje tionsweonlyneedto onsiderasinglegroupformed

by S

L

operator atatime. Thatis,aT tuplefromadi er-

entgroupwillnoteliminateanRtuplefromanothergroup.

Thus,performingthegroupingattheserverside,whilenot

ne essary, ouldsigni antlyredu ethe omputationatthe

lient.

Se ond,evenwiththeaboveoptimization,theimplemen-

tationoftheset-di eren eoperator usingtheouter-joinon

theservershouldbeusedwith are. Anaivestrategyis to

S S

(8)

strategymightbe heaperthanthepreviousstrategysin e

thesizeoftheouterjoinmightbequadrati resultinginhigh

transmissionandde ryption ost omparedtothestrategy

oftransmittingthe tworelationsindividually and omput-

ingthe setdi eren eat the lient. Whi hstrategyisused

dependsuponthe ontentoftherelations. Sele tingthespe-

i strategydependsuponintegrating ourframeworkinto

a ost-basedqueryoptimizer,whi hisbeyondthe s opeof

thispaper.

TheUnion Operator ([): Thereareessentiallytwodif-

ferentunionoperatorsbasedonthebagandthesetseman-

ti s. Theformerdoesnoteliminatedupli ates,whilethelat-

terdoes. Theimplementationof theunionoperator based

onbagsemanti sisstraightforward:

R[T=D(R S

[ S

T S

)

Ifwewishto omputetheunionunderthesetsemanti s,

it anbe omputedasfollows:

R[T =Æ



D  S

L (R

S

[ S

T S

)





whereL=listofallattributesA S

i

whereAiisanattribute

inR .

While the implementation of the union operator onthe

server side (that is, [ S

) is straightforward, there is one

te hni al hallenge thatarises. Spe i ally, sin etuples in

R S

[ S

T S

ouldoriginateeitherfromR S

orT S

,tobeableto

applythe orre tde ryption fun tionatthe lient,as well

as to orre tly interpretthe values inthe indexattributes

of the result at the server, we will store anadditional at-

tribute inthe result of the union that will determine the

origin of the tuple(thatis, whether it originates from R S

orT S

). Addingsu hanattribute is ru ialfor the orre t

implementation, but we will ignoreit in the dis ussion to

keepthe algebrasimple. Thefull version ofthe paper [5℄

illustratestheneedformaintainingtheadditionalattribute

and the resulting modi ations to the mapping fun tions

anddevelopedalgebra.

TheProje tionOperator():Sin eea htupleinarela-

tionRisen ryptedtogetherintoasinglestringintheetupl e

attributeofrelationR S

atthe server, aproje tionisnot

implementedattheserver. Asaresult,to omputeL(R ),

whereLisasetofattributes,thestrategyistotransmitthe

ompleterelationR S

to the lient, de rypttherelation at

the lient,andthen omputetheproje tion. Thatis,

L(R )=L D(R S

)



Forinstan e,wehave

eid

(emp)=

eid D(emp

S

)



.

5. ALGEBRAIC FRAMEWORK FOR QUERY SPLITTING

GivenaqueryQ,ourpurposeinthisse tionistodevelop

astrategy to splitthe omputation of Qa ross the server

and the lient. The serverwill use the implementationof

therelationaloperatorsdis ussedinthepreviousse tionto

omputeasmu hofthequeryaspossible,relegatingthere-

mainderofthe omputationtothe lient. Ourobje tiveisto

omeupwiththe\best"queryplanforQthatminimizesthe

queryattheserver,thenetworktransmission ost,andthe

I/OandCPU ostatthe lient. Avarietyofpossibilitiesex-

ist. Forexample, onsiderthefollowingqueryovertheemp

table thatretrievesemployeeswhose salaryis greater that

theaveragesalaryofemployeesinthedepartmentidenti ed

bydid=1.

SELECT emp.name FROM emp

WHERE emp.salary > (SELECT AVG(salary)

FROM emp WHERE did = 1);



did=1

emp emp

avg(sal)!avgsal

ename

./

emp:sal>avgsal

(a) Original query

tree.



did=1

D

emp S

avg(sal)!avgsal



ename

emp S D

Q s

./

emp:sal>avgsal

(b) Repla ing en-

ryptedrelations.

D

emp S

avg(sal)!avgsal

ename

./

emp:sal>avgsal

Q s



did=1



Map ond(did=1)

emp S D

( ) Doing sele -

tionatserver.

./

S

emp:sal>avgsal

emp S



did=1

Map

ond (did=1)

emp S D

avg(sal)!avgsal E

D



emp:sal>avgsal

ename

Q

Q s Q

s Q

(d) Multiple intera tions

betweenClientandServer.

Figure4: Queryplansforemployeeswhomakemore

thanaveragesalaryofemloyeeswhoarein did=1

mgr ./

emp:did=mgr:did

mname

./

emp:pid=proj:pid

pname=

0

disk drive 0

sal>100K

proj emp

(a)Originalquerytree.

D

D ./

emp:did=mgr:did

mname

./

emp:pid=proj:pid



sal>100K

proj S

emp S D

mgr S

Q S



pname=

0

disk drive 0

(b)Afterrepla ingwith

en ryptedrelations.

Figure 5: Evaluation ofajoinquery.

The orrespondingquerytreeandsomeoftheevaluation

strategies are illustrated in Figures 4(a) to (d). The rst

strategy (Figure 4(b)) is to simply transmit the emp ta-

(9)

D ./

emp:did=mgr:did

mgr S ./

emp:pid=proj:pid



pname=

0

disk drive 0



sal>100K

D D

 s

Map ond(pname=

0

disk drive 0

)

 s

Map ond(sal>100K)

proj S

emp S mname

Q S

(a)Afterrewritingsele tions.

D ./

emp:did=mgr:did

mgr S

D D

 s

Map ond(pname=

0

disk drive 0

)

 s

Map ond(sal>100K)

proj S

emp S mname

./

emp:pid=proj:pid



pname=

0

disk drive 0

^sal>100K

Q S

(b)Afterpullingupsele tions.

 s

Map

ond

(pname=

0

disk drive 0

)

 s

Map

ond

(sal>100K)

proj S

emp S ./

emp:did=mgr:did

mgr S



mname

./ S

Map

ond

(emp:pid=proj:pid)



pname=

0

disk drive 0

^sal>100K

Q S

D

D



emp:pid=proj:pid

( )Afterrewritingjoin.

 s

Map

ond

(pname=

0

disk drive 0

)

 s

Map

ond

(sal>100K)

proj S

emp S

mgr S D

Q S

./ S

Map

ond

(emp:did=mgr:did)

./ S

Map

ond

(emp:pid=proj:pid)



mname



pname=

0

disk drive 0

^sal>100K^proj:pid=emp:pid^emp:did=mgr:did

(d)The nalquerytree.

Figure 6: Query rewriting heuristi for joinqueries

strategy(Figure4( ))isto omputepartoftheinnerquery

attheserver,whi hsele ts(asmanyaspossible)tuples or-

respondingto Map ond(did =1). Theserversends to the

lient the en rypted version of the emp table, i.e., emp S

,

alongwiththeen ryptedrepresentationofthesetoftuples

thatsatisfytheinnerquery. The lientde ryptsthetuples

to evaluatethe remainderof the query. Yetanother pos-

sibility (Figure 4(d))is to evaluatethe inner queryat the

server. Thatis, sele tthetuples orrespondingto theem-

ployeesthat work indepartment did=1. Theresults are

shipped to the lient, whi h de rypts thetuples and om-

putesaveragesalary. Theaveragesalaryisen ryptedbythe

lientandshippedba ktotheserver,whi hthen omputes

thejoinat theserver. Finally,theresultsare de ryptedat

the lient.

5.1 Heuristic Rules to Separate Queries

It shouldimmediately be obviousthat ari hset ofpos-

sibilitiesexistinevaluatingaqueryinour framework, and

that the de ision of the exa t query plan should be ost

based. Thistopi ,however,isoutsidethes opeofthispa-

per. Our attempt is primarily to establish the feasibility

oftheproposedmodel,and ost-basedoptimizationisrele-

gatedtofuturework. Instead,inthisse tion,wewillrestri t

thatallowforagivenquerytreetobesplitintotwoparts{

theserverpart(referredtoasQ S

)thatexe utesattheserver

rst,andthe lientpart(referredtoasQ C

)thatexe utesat

the lientbasedontheresultsofthequeryevaluatedatthe

server. Ourobje tive willbeto minimizethe omputation

inQ C

. Thatis,wewouldattempttorewritethequerytree,

su hthatmost ofthee ortof evaluatingthe queryo urs

attheserver,andthe lientdoesleastamountofwork.

Weillustrateourideasusingexamples.Asa rstexample,

onsiderthefollowingquerythat omputesthenamesofthe

managersofthoseemployeesworkingonproje t\diskdrive"

whosesalaryismorethan100K.

SELECT mname

FROM emp, mgr, proj

WHERE proj.pname = `diskdrive'

AND proj.pid = emp.pid

AND emp.sal > 100K

AND emp.did = mgr.did;

The rststepisto onvertthe abovequeryintoa orre-

spondingquery tree, and to manipulate the query tree to

generateagoodplan(usingthestandardqueryrewritelaws

ofrelationalalgebra[13℄). Figure5(a)showsthequerytree

inwhi hthetwosele tionshavebeenpusheddowntorela-

(10)

withen ryptedrelationR S

. Theresultingtreeis shownin

Figure5(b).

As it stands, the urrent query tree requires the entire

relations proj S

, emp S

, and mgr S

to be sent to the lient

that will de rypt the relations to evaluate the query. We

nextrepla e thesele tion operationsbytheir implementa-

tionlistedinthepreviousse tionresultinginthequerytree

showninFigure6(a). Noti ethatinthe orrespondingtree,

theserverisparti ipatingintheevaluationofthetwosele -

tion onditions. Sin eourobje tive isto performasmu h

ofthe omputation at the serveras possible, we nextpull

upthetwo lient-sidesele tion onditions

pname=

0

disk drive 0

and

sal=100K

abovethejoinoperator

1

emp:pid=proj:pid using

thestandardrewrite rulesinvolving sele tionsinrelational

algebra[13℄. The newquerytree is shown inFigure6(b).

We annowrewritethequerytreeagainusingthejoinim-

plementation dis ussed in the previous se tion, su h that

1

emp:pid=proj:pid

isexe utedattheserver. Figure6( )shows

thequerytreeaftertherewriting. Finally,we pullthetwo

sele tions

pname=

0

disk drive 0

^sal=100K and

emp:pid=proj:pid

above the join operator

1

emp:did=mgr:did

. Then we repla e

thejoinoperatorbasedontheimplementationdis ussedin

thepreviousse tion,andgetthe nalquerytree,asshown

Figure6(d).

Noti ethatinthetreeofFigure6(d),mu hoftheworkof

querypro essingisdoneattheserver. Theresultsobtained

fromtheserverarede ryptedand lteredatthe lient. Our

su essinsplittingthequeryQintotheserver-sideQ S

and

lient-sideQ C

dependedon(1)beingabletopullthesele -

tionoperationsabove otherrelationaloperationshigherin

thequerytree;and(2)repeatedlyrewritingthehigher-level

operationsusingtheoperatorimplementationslistedinthe

previousse tion.

There are situations whenthe sele tion operator annot

bepulledupthequerytreeasitisillustratedinthefollowing

example, whi h uses aset-di eren e operator. Considera

querythatretrievesthe setof employeeswho donotwork

for the manager named \Bob." The orresponding SQL

queryisshownbelow:

SELECT ename FROM emp

WHERE eid NOT IN (

SELECT eid FROM emp, mgr

WHERE eid = mid AND ename = 'Bob');

Using the strategy dis ussed above, we an easily on-

vertthe queryintothequerytreeshowninFigure7(a). If

wearetoexe utethequeryplanillustratedinFigure7(a),

the server will submit to the lient the relation emp S

, as

well asthe en ryptedanswersgenerated bythe 1 S

opera-

tor. Theproje tionsfollowedbytheset-di eren eoperator

will be implemented at the lient. Noti esin e the sele -

tion and proje tion operators annot be pulled above the

set-di eren eoperator,itisdiÆ ulttoapplytheimplemen-

tationoftheset-di eren eoperatordis ussedintheprevious

se tiontoevaluatethesetdi eren eattheserver. Thetri k

istorewritetheset-di eren eoperatorusingthe left-outer

joinoperator (similartotheimplementationofsetdi er-

en edis ussed inthe previous se tion). Using the rewrite

lawforsetdi eren e,the orrespondingtreeismodi edto

thequerytreeshowninFigure 7(b). We annowpullthe

sele tionsandproje tionsabovetheouterjoin,resultingin

D



eid

mgr S

 S

Map ond(ename=

0

Bob 0

) ./

S

Map

ond (eid=mid) D

emp S

mid

Q S

emp S

eid=mid^ename=

0

Bob 0

(a)Query tree withjoin

omputationatServer.

D

mid

eid



eid

mgr S

 S

Map ond(ename=

0

Bob 0

) ./

S

Map ond(eid=mid) D

emp S

mid eid=mid

Q S

emp S



eid=mid^ename=

0

Bob 0

(b) Rewriting di eren e

operatorusingouterjoin.

mid

eid



eid;mid

D

 S

Map ond(ename=

0

Bob 0

) mgr

S D

emp S

eid=mid

Q S



eid=mid^ename=

0

Bob 0

./ S

Map ond(eid=mid)

emp S

( )Afterpullingsele -

tionandproje tion.



mid



eid

emp S

D

 S

Map ond(ename=

0

Bob 0

) mgr

S ./

S

Map ond(eid=mid) S

Map ond(eid=mid)

eid=mid^ename=

0

Bob 0

eid;mid

emp S

Q S

(d)Finalquerytree.

Figure7: Queryrewritingforaset-di eren equery.

manipulatedusingtheoperatorimplementationdis ussedin

thepreviousse tion,resultinginthe naltreeshowninFig-

ure 7(d). The nal treeperformsmu hof thequery om-

putation at the server, and the results are de rypted and

ltered,andthe nalanswerisevaluatedatthe lient.

6. EXPERIMENTAL EVALUATION

Wehave ondu tedexperimentstoshowthevalidityand

thee e tivenessofthear hite tureproposedinthispaper.

Inthisse tion, wepresentourexperimentalresults.

We ran the tests by utilizing TPC-H ben hmark [14℄.

TPC-H ben hmarkdatabase is reatedats ale fa tor0.01

and0.1,whi harealsoreferredtoasthe10MBand100MB

databaserespe tively. Theexperimentswere ondu tedon

twoIBM Intel-basedpersonal omputers withPentiumIII

700MHzpro essorswith256MBRAM.Oneofthe omput-

ersperformed asthe server,and anotheroneperformed as

the lient a ording to our lient/serverar hite ture. Rel-

evant software omponents used were IBM DB2 v7.1 and

Mi rosoftWindows2000astheoperatingsystem.

Relations: WhiletheTPC-H ben hmarkin ludes mul-

tipletables,ofparti ularinterestinourexperimentalstudy

are thelineitem, ustomer, and order tables. We parti-

tionedthefollowing attributesfor thesetables:

Lineitem : l_shipdate, l_dis ount, l_quantity

Orders : o_orderdate,o_ ustkey,o_shippri ority

Customer : _ ustkey

Partitionshavebeen reatedbasedonthepartitioning ri-

teriades ribedbelow. Toen rypttherowsoftherelations,

weusedtheBlow shen ryptionalgorithm[12℄implemented

(11)

depthhistograms [9℄to partitionthe datafor twodi erent

lasses of queries. Equi-width and equi-depth histograms

havebeenwidelyusedandinvestigatedinthe ontextofse-

le tivityestimationindatabases[7,8℄. Detaileddes ription

of onstru tingequi-depthhistogramsisgivenin[9℄.

Queries:We onsideredtwodi erentqueriesfromTPC-

Hsuitetopresent theevaluationofthedi erentaspe tsof

the ar hite ture. The rst query, as shown in Figure 8,

is a sele tion query from a single table, and it is not in-

volvedjoinoperations. These ondquery,as showninFig-

ure9,isamodi edversion ofTPC-Hquerynumber3,de-

noted Q3. This query involves a join operation between

two tables, ustomer and orders. We rst su essfully

rewrotethegivenqueriesusingtherewritingrulesdes ribed

inthepaper,andthenexe utedthetranslatedqueriesinthe

lient/serverar hite turewithdi erentpartitionings hemes.

sele tsum(lextendedpri e*ldis ount)asrevenue

fromtp d.lineitem

wherelshipdate>=date('1994-01-01')

andlshipdate<date('1994-01-01')+1year

andldis ountbetween0.06-0.01and0.06+0.01

andlquantity<24

Figure 8: Query used for rst set of experiments,

basedonQ6fromTPC-H ben hmark.

sele toorderdate,oshippriorityfromtp d. ustomer,tp d.orders

where ustkey=o ustkeyandoorderdate<date('1995-03-15')

groupbyoorderdate,oshippriorityorderbyoorderdate

Figure9: Queryusedforse ondsetofexperiments,

basedonQ3fromTPC-H ben hmark.

6.1 Experiment 1

Inthe rstsetofexperiments,westudiedthe omponents

of the query-exe ution time in our ar hite ture. We on-

du tedthesetestswithin reasingnumberofbu kets. Fig-

ure10(a)(b)showthe resultsofthetests. Itis shownthat

network ommuni ation ostand lient-sitequery-exe ution

timesigni antlyde reasewiththein reaseinthenumber

ofbu kets. Thereason isduetothe de reasingnumberof

rowsreturnedby theserver. Whenthenumberofbu kets

thatpartitionthedatain reases,theserverhasabetter a-

pability to lter out morefalse rows, whi hdonot satisfy

thesele tionpredi ates. Whilenetwork ostand lient-site

query-exe utiontimede reasesharply,itisnotthe asefor

theserver-sitequery-exe utiontime.Intheexperiments,the

sele tivity of the queryis approximately 18%. Be ause of

thepossibilityofprefet hbat hI/O's,doingatables anre-

mainedasthebest hoi eforthedatabaseoptimizer.Hen e,

independentlyfromthe numberofbu kets,predi ateeval-

uationisperformedviaasequentialtables an, ausingthe

steadybehaviorinserver-sitequery-exe utiontime.

Intheseexperimentswealso omparedthequery-exe ution

times in our ar hite turewith the ase of having a single

server,whi hperformsallthefun tionsdes ribedinthear-

hite ture. Theformerrepresentstotaldatapriva y,while

thelatter representsrow-leveldataen ryption/de ryption,

wheretheserveristrustedtode ryptthedata. Figure10(b)

showsthis omparison. Againwepresenttheresultsfordif-

ferentnumbersofbu kets. The rstbarinthe gureshows

bles,andperformstherealqueryonthesele tedrows. The

se ond and third bars show the query-exe utiontimes for

theserversideand lientsiderespe tivelywhenthequeryis

exe utedinourar hite ture.

5

Theseexperimentsshowthat

ourar hite turedoesnotintrodu esigni antoverheaddue

totheproposed ommuni ationproto olbetween lientand

server.

6.2 Experiment 2

Inthese ondsetofexperiments,westudiedqueriesthat

in ludejoinoperations. Experimentsarebasedonthemod-

i ed version of Q3 (Figure 9) in the TPC-H ben hmark.

Figure11(a)to( )showthe lient-side,server-side,andthe

totalquery-exe utiontimesforin reasingnumberofbu kets

onjoinattributes,namely, ustkeyando ustkeyinthe

query. The gure illustratesthat queryresponse timesde-

reaseverysharply withthein reasingnumberofbu kets.

As was explained in the previous experiment, this behav-

iorisprimarilysin ewithin reasingnumberofbu ketsthe

serverisbetter abletoeliminatetupleswhi hwouldother-

wisehavetobede ryptedand lteredatthe lient.Theper-

forman eissigni antlyimprovedforboth lientandserver

sidequeries. Althoughthe lient-sidequery-exe utiontime

alsoshowssteepde rease,itis greaterthanthe server-side

query-exe ution time. The reason is due to the dominant

ostof de ryptionperformed atthe lient site. Toexpress

this fa t, Figure 11(b) shows the query response times of

lient-sidequery-exe utiontimewithde ryptionand lient-

sidequery-exe utiontimewithoutde ryption,whi hisplot-

ted by removingde ryption ostfrom thequery-exe ution

time.

Aswasstudiedinour rstsetexperiments,Figure11( )

showsthe totalquery-exe utiontimesfor single-serverand

lient-serverar hite tures.Theresultsofthesetestsarealso

onsistentwiththepreviousones.

7. CONCLUSIONS

Appli ationServi eProvider (ASP)modelfor enterprise

omputinghas emergedwith theriseof Internet te hnolo-

gies. IntheASPmodel,aservi eprovider anprovidesoft-

wareasaservi etoaverylarge lient-baseovertheInternet.

Unlike manyotherservi es, however, databases are spe-

ial. Data is a pre ious resour e of an enterprise. As a

result, priva y andse urity ofdata atthe servi e-provider

site is paramount. In this paper, we addressed a spe i

data-priva y hallenge{whatifthe ownerof thedatabase

doesnottrusttheservi e providerwiththedata? Ourso-

lution isto storethe dataat theservi e provider afteren-

ryptingit,whi h anonlybede ryptedbytheowner. We

havedevelopedte hniquesusingwhi hthebulkofthework

of exe uting the SQL queries an be done by the servi e

providerwithouttheneedto de ryptthestoreddata. The

te hniquedeploysa\ oarseindex",whi hallowspartialex-

e ution ofan SQLquery onthe provider side. Theresult

ofthisqueryissenttothe lient. The orre tresultofthe

queryisfoundbyde ryptingthedata,andexe utinga om-

pensationqueryatthe lientsite. Weproposedte hniqueto

operatetheSQLquery,andsplititintoaserverqueryanda

5

The lient-sitequery-exe utiontimealsoin ludesthenet-

work ommuni ation ostrequiredtotransfersele tedrows

(12)

2 8 Number of Buckets

Q u e ry E x e c u ti o n T im e

Client site Network cost Server site

(a) Cost fa tors for the query-

exe utiontime.

2 8

Number of Buckets

Q u e ry E x e c u ti o n T im e

Single server Server site (C/S) Client site (C/S)

(b) Comparison for Client-Server

strategy v.s. the single-server

strategy.

Figure 10: E e t ofnumberofbu ketsonnonjoin queries.

(a) Client, server, and total

query-exe utiontimes.

(b) E e t of de ryption per-

formedonthe lientsite.

( )Comparison for lient-server

strategyv.s. single-serverstrat-

egy

Figure 11: E e t ofnumber ofbu ketsonjoinquery.

lientquery. Theservi eproviderretainstheresponsibility

tomanagethepersisten eofthedata. The lientgetstotal

priva y,andthe ostof ooperatinginqueryexe utionwith

the servi e provider. The lient does not needto manage

datapersisten e,thus ontinuestobene tfromthesystem-

managementservi eofthedatabaseservi eprovider.

8. REFERENCES

[1℄ AES.Advan edEn ryptionStandard.National

Instituteof S ien eandTe hnology, FIPS197,2001.

[2℄ D.SongandD.WagnerandA.Perrig.Sear hon

en rypteddata.InPro .ofIEEE SRSP,2000.

[3℄ DES.DataEn ryptionStandard.FIPSPUB46,

FederalInformationPro essingStandardsPubli ation,

1977.

[4℄ H.Gar ia-Molina,J.Ullman,andJ.Widom.Database

Systems: TheCompleteBook.Prenti eHall,2002.

[5℄ H.Ha gumus,B.Iyer,C.Li,andS.Mehrotra.

Exe utingSQLoverEn ryptedDatain

Database-Servi e-ProviderModel.Te hni alReport

TR-DB-02-02,DatabaseResear hGroupatUniversity

ofCalifornia,Irvine,2002.

[6℄ H.Ha gumus,B.Iyer,andS.Mehrotra.Providing

DatabaseasaServi e.InPro .ofICDE,2002.

[7℄ Y.E.IoannidisandV.Poosala. Histogram-based

approximationofset-valuedqueryanswers.InPro .of

[8℄ H.V.Jagadish,H.Jin,B.C.Ooi,andK.-L.Tan.

Globaloptimizationofhistograms.InPro .ofACM

SIGMOD,2001.

[9℄ G.Piatatetsky-ShapiroandC.Connell.A urate

estimationofthenumberoftuplessatisfyinga

ondition.InPro .ofACMSIGMOD,1984.

[10℄ R.L.Rivest,L.M.Adleman,andM.Dertouzos.On

DataBanksandPriva yHomomorphisms.In

Foundations ofSe ure Computation,pages169{178,

1978.

[11℄ R.L.Rivest,A.Shamir,andL.M.Adleman.A

methodforobtainingdigitalsignaturesandpubli key

ryptosystems.Communi ationsofthe ACM,

21(2):120{126, 1978.

[12℄ B. S hneier.Des riptionofanewvariable-lengthkey,

blo k ipher(blow sh),fastsoftwareen ryption.In

Cambridge Se urityWorkshopPro eedings,1994.

[13℄ A.Silbers hatz,H.F.Korth,andS.Sudarshan.

DatabaseSystemCon epts,3rdEdition.M Graw-Hill

BookCompany,1997.

[14℄ TPC-H.Ben hmarkSpe i ation.

http://www.tp .org.

[15℄ M. WinslettandJ.D.Ullman.Je reyD.Ullman

speaksoutonthefutureofhigheredu ation,startups,

databasetheory,andmore.SIGMOD Re ord,30(3),

2001.

References

Related documents

Our company was founded in 1977 and initially developed IT systems, computer networks over optic fibres and radio links. In 2002 Ondarete™ entered the market of video

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

The presentation may contain “forward looking statements” within the meaning of the United States private securities litigation reform act of 1995 and “forward looking

3 plant specimens collected and/or photographed, one species treated at one locality constitutes one record 4 approximate number of species treated (collected and/or photographed).

In our search for new biomarkers we also assessed the role of SPOP gene in PCa confirming its loss of expression and mutations in PCa but also being the first group to

• “The alteration or repair identified herein complies with the applicable airworthiness requirements and is approved for use only on the above described aircraft, subject to

Abstract This essay reviews diverse strands of empirical and theoretical work in different urban studies areas (urban planning, urban ethnography, urban

Homogenates were prepared from the dissected organs and stained with a polyclonal antibody raised against whole spirochetes (1), or polyclonal sera raised against OspA (2)