5302.pdf

(1)

Privacy-Preserving Regular Expression Evaluation on Encrypted Data

Lei Wei

A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science.

Chapel Hill 2013

Approved by:

Christian Cachin

Philip Mackenzie

Fabian Monrose

Michael Reiter, Chair

(2)

(3)

ABSTRACT

LEI WEI: Privacy-preserving Regular-Expression Evaluation on Encrypted Data (Under the direction of Michael K. Reiter)

Motivated by the need to outsource file storage to untrusted clouds while still permitting con-trolled use of that data by authorized third parties, in this dissertation we present a family of proto-cols by which a client can evaluate a regular expression on an encrypted file stored at a server (the cloud), once authorized to do so by the file owner. We present a protocol that provably protects the privacy of the regular expression and the file contents from a malicious server and the privacy of the file contents (except for the evaluation result) from an honest-but-curious client. We then extend this protocol in two primary directions. In one direction, we develop a strengthened protocol that enables the client to detect any misbehavior of the server; in particular, the client can verify that the result of its regular-expression evaluation is based on the authentic file stored there by the data owner, and in this sense the file and evaluation result are authenticated to the client.

(4)

(5)

ACKNOWLEDGEMENTS

The completion of this dissertation would not have been possible without the guidance of my advisor, Prof. Mike Reiter, whom I feel extremely fortunate to have worked with and owe all my gratitude to. Looking back 6 years ago when I accidentally stepped into this field as a newcomer, he has been instrumental in my personal development and helped shape who I am today. Throughout the years, I have been immensely impressed by and benefited from his masterful understanding of the field, endless source of wisdom, careful direction and advise, and sometimes brutal demands to lift me up to his high standards. There is still so much I can learn from him everyday, but I feel like it is time for me to carry with all these qualities learned from him going forward, hopefully to have a positive impact on others.

I would also like to thank Dr. Christian Cachin, Dr. Fabian Monrose, Dr. Philip Mackenzie and Dr. Gene Tsudik for serving on my dissertation committee. I am very grateful for all of them to take time from their busy schedules to hold meetings with me and providing invaluable feedbacks, especially considering the challenge of scheduling across 9 time zones.

Special thanks to Fabian, who is always available for encouragement, a fun chat or simply a game of foosball or pingpong to distract myself from the stress in graduate school.

I would also like to thank my friend, Hao Xu, for his generosity for providing many insightful discussions and assistance on fighting bugs in my code from time to time. I would also like to thank all my friends in the security lab, from whom I benefited through inspiring discussions, helpful critiques to my work and feedbacks that helped me improve my presentation skills.

My gratitude for my girlfriend, Shih Ku, who has accompanied and supported me throughout my time in graduate school, though not always physically close, but always close in heart.

(6)

LIST OF TABLES

5.1 Average time spent per email in seconds (numbers in braces are when

(10)

LIST OF FIGURES

1.1 Overall framework: data owner stores encrypted data at theserver, with

which an authorizedclientperforms private searches. . . 2

3.1 ProtocolΠ1(E), described in Section 3.2 . . . 18

3.2 Experiments for proving DFA privacy ofΠ1(E)againstserveradversaries . . . 20

3.3 Experiment for IND-CPA security . . . 21

3.4 Experiments for proving file privacy ofΠ1(E)againstserveradversaries . . . 22

3.5 Experiment for proving file privacy ofΠ1(E)againstclientadversaries . . . 25

4.2 Experiments for proving DFA and file privacy of Π3(E) against server adversaries . . . 43

4.3 Experiment for proving result authenticity againstserveradversaries . . . 48

4.4 Experiment for defining BCDH problem . . . 49

4.5 Experiment for proving file privacy againstclientadversaries . . . 54

5.1 ProtocolΠ′₁(_E), described in Section 5.1.1 . . . 60

5.2 ProtocolΠ4(E), described in Section 5.1.2 . . . 65

5.3 Optimized protocolΠ5(E), described in Section 5.2 . . . 72

5.4 Experiments for proving DFA privacy ofΠ5(E)againstserveradversaries . . . 74

5.5 Experiments for proving file privacy ofΠ5(E)againstserveradversaries . . . 77

5.6 Experiments for proving DFA privacy ofΠ5(E)againstproxyadversaries . . . 79

5.7 Experiments for proving file privacy ofΠ5(E)againstproxyadversaries . . . 82

5.8 Time spent per file character in milliseconds, with pairing preprocessing disabled . . . 85

5.9 Time spent per file character in milliseconds, with pairing preprocessing enabled . . . 85

(11)

CHAPTER 1

Introduction

Outsourcing file storage to storage service providers (SSPs) and “clouds” can provide signifi-cant savings to file owners in terms of management costs and capital investments (e.g., [58]). How-ever, because cloud storage can heighten the risk of file disclosure, prudent file owners encrypt their cloud-resident files to protect their confidentiality. This encryption introduces difficulties in managing access to these files by third parties, however. For example:

• Third-party service providers who are contracted to analyze files stored in the cloud generally cannot do so if the files are encrypted. For example, periodically “scanning” files to detect new malware, as is common today for PC platforms, cannot presently be performed on encrypted files by a third party.

• With some exceptions (see Chapter 2), third-party customers generally cannot search the files if they are encrypted. Searches on genome datasets, pharmaceutical databases, document corpora, or network logs are critical for research in various fields, but the privacy constraints of these datasets may mandate their encryption, particularly when stored in the cloud.

These difficulties are compounded when the third party views its queries on the files to be sensitive, as well. New malware signatures may be sensitive since releasing them enables attackers to design malware to evade them (e.g., [75]). Customers of datasets in numerous domains (e.g., pharmaceutical research) may view their research interests, and hence their queries, as private.

(12)

Data Owner

Cloud Provider (Server)

Service Provider (Client)

Figure 1.1: Overall framework: data owner stores encrypted data at the server, with which an authorizedclientperforms private searches.

motivated by the scenarios above, which in many cases involve pattern matching a file against one or more regular expressions. Regular-expression searches are a widely adopted search primitive in many languages and programming frameworks1 _{(e.g., see [38]). Multi-pattern string matching}

is especially common in analysis of content for malware (e.g., [64, 50]) and also is commonplace in searches on genome data, for example. In fact, there are now a number of available genome databases (e.g., [2, 5]) and accompanying tools for multi-pattern matching against them (e.g., [12]).

1.1 Third-Party Private DFA Evaluation on Encrypted Files in the

Cloud

With the goal of improving privacy in such applications above, in Chapter 3 we develop novel protocols to evaluate a deterministic finite automaton (DFA) of theclient’s choice on the plaintext of the encrypted file and to return the final state to theclientto indicate which, if any, of the patterns encoded in the DFA were matched. We stress that while there is much work on secure two-party

1_{To be more precise, the term “regular expression” is used in some frameworks in a way that follows but deviates}

(13)

computation including the specific case of private DFA evaluation on a private file, very few works have anticipated the possibility that the file is available only in encrypted form. This setting will become more common as data-storage outsourcing grows.

The security properties we prove for our protocols include privacy of the DFA and file contents against arbitraryserveradversaries, and privacy of the file (except what is revealed by the evaluation result) against honest-but-curious clientadversaries. Though our proofs are limited to only honest-but-curiousclientadversaries, we also provide heuristic justification for the security of our protocols against arbitraryclientadversaries. Our protocols appear to be extensible with standard techniques to provably protect file privacy against arbitraryclientadversaries, but we stop short of doing so in light of the substantially greater cost it would impose and our motivating scenarios involving third parties that the file owner must authorize and so presumably trusts to some extent. We do, however, discuss efficient heuristics to detect a misbehavingclientorserverthat highlight new opportunities in the cloud storage setting.

A central observation that facilitates our protocols is that a DFA transition function can be encoded as a bivariate polynomial over the ring of an additively homomorphic encryption scheme with which the file characters are encrypted. In our protocols, theclient, who has this polynomial as input, and theserver, who has the encrypted file as input, obliviously perform DFA state transitions by jointly evaluating this polynomial. Neither party learns the current state at any point of the protocol execution; instead, they share the current state at each step, requiring that the polynomial be adapted in each round to accommodate this sharing.

We believe our protocols will be efficient enough for many practical scenarios. They support evaluation of any DFA over an alphabetΣon any file consisting ofℓsymbols drawn fromΣ, and require the file to be stored using ℓm ciphertexts where m = |Σ|. Since m is a multiplicative factor in the storage cost, our protocols are best suited small alphabetsΣ, e.g., bits (m= 2), bytes (m= 256), alphanumeric characters (m= 36), or DNA nucleotides (m= 4for “A”, “C”, “G”, and

“T”). Specifically, in Chapter 3, we first present a protocol that leverages additively homomorphic encryption (e.g., [59]) and transmits(nm+3)ℓ+3ciphertexts to evaluate a DFA ofnstates. We then leverage additively homomorphic encryption that also supportsonehomomorphic multiplication of ciphertexts (e.g., [17]) to construct an improved protocol that transmits only (n+m + 1)ℓ+ 3

(14)

noninteractive protocol with a communication cost ofO(nm)fully homomorphic ciphertexts and, in particular, that is independent of the file lengthℓ.

1.2 Ensuring File Authenticity in Private DFA Evaluation on Encrypted

Files in the Cloud

Even though our protocols provide provable privacy guarantees for both the DFA query and file content against arbitrarily maliciousserveradversaries, a maliciousserver could still try to tamper with the evaluation result by deviating from the protocol specification, or even input fraudulent en-crypted files into the protocol to fool theclient. Indeed, though the traditional notion of a protocol secure against an arbitrarily malicious adversary prevent any misbehaviors during protocol execu-tion, it provides no guarantees on whatinput a malicious party may use in the protocol. Protocols for a third-party client to perform private searches on encrypted data in the cloud, while revealing nothing to the cloud server and nothing but the search result to the client, do exist for some types of searches (e.g., [67, 28, 73]). To our knowledge, however, none also enforce that the cloud server employs the data that the data owner stored at the cloud server.

Motivated by this, in Chapter 4 we present a strengthened protocol that allows the clientto detect any misbehavior of theserver, and in particular, to tell whether theserverinput the authentic encrypted file stored there by the data owner. In that sense, the authenticity of the file input by the serverand the integrity of the computation result are both enforced. At the same time, the protocol provably protects the file contents (except for the result of the computation) from an honest-but-curious client (and heuristically from even a malicious client) and provably protects both the file contents and DFA from an arbitrarily maliciousserver. To our knowledge, our protocol is the first example of performing secure DFA computation on both encryptedand authenticated data.

(15)

instantiate this intuition, however, it would require much higher computation and communication costs than our protocol. Instead, we introduce a new technique to enforce correct server behavior and the authenticity of the input on which it is allowed to operate, without relying on zero-knowledge proofs at all. At a high level, the protocol takes advantage of the verifiability of the computation result to check the correctness of the server behavior. The protocol is designed so that that legitimate outputs are encoded in a small space only known to the client, and any malicious behavior by the server will result in the final output lying outside this space, which is then easily detected by the client. We prove this property (in the random oracle model) and the privacy of both the file and the DFA against an arbitrarily malicious server. We also prove the privacy of the file (except for the result of the DFA evaluation) against an honest-but-curious client.

1.3 Toward Practical Encrypted Email That Supports Private,

Regular-Expression Searches

As a practical application of private regular expression searches on encrypted data, in Chapter 5 we report a case study of a prototype implementation using our protocol to perform private regular expression searches on encrypted emails. In particular, the protocol developed there suffices to sup-port the search options (including Boolean combinations) offered by the Thunderbird email client for the text and numeric email fields, for example. Our system is thus able to support range queries on the date field and various types of substring queries on the source, destination, and subject fields of emails.

(16)

study of user email query patterns [40] showed that many queries that users create are only partial words, and so substring searching capability is important to provide an adequate user experience. Furthermore, very few searchable encryption schemes offer the capabilities of performing substring, conjunctive, disjunctive and range queries, and we are aware of none that offers all them at the same time.

Our protocol for regular-expression searching gains computational efficiency by using interac-tion, in fact requiring data transfer between the searching client and theserverholding the ciphertext of a volume larger than the searchable ciphertext itself. This obviously begs the question of whether a more suitable solution would be to download each email to the client and decrypt it there, to be searched locally. With the widespread use of volume-priced networking (i.e., over cellular data plans), however, neither design is particularly appealing. So, we instead explore a different design in which theuser(e.g., via her mobile device) submits herencryptedregular expression (or suitable representation thereof) to aproxy, which then interacts with theserver hosting the encrypted data using our protocol. After this interaction, theproxyreports information back to theuserthat permits her to determine whether there was a match, so she can retrieve the file from theserverin that case. We stress that the interaction between theuserand theproxyis independent of the lengths and num-ber of ciphertexts stored at theserver, and that theproxyis untrusted for the privacy of the search or the file contents (provided that it does not collaborate maliciously with theserver). So, for example, theproxycould be run in a cloud distinct from that where theserveris run.

(17)

Following these optimizations, we detail an implementation of our protocol and its performance when searching emails from a real-world email dataset. We show, for example, that our implemen-tation incurs average latencies of 0.89 seconds per email for performing a 9-character substring search on the sender email address field, and 0.17 seconds per email for performing a range query spanning about 6 months on the email date field. These numbers were obtained from aproxyand servereach having 8 physical cores with simultaneous multithreading enabled, yielding 16 logical cores. We also evaluate options for exploiting parallelism with our protocol, ranging from very coarse (i.e., oneserverthread and oneproxythread perserver-proxyprotocol instance, but running 16 protocols instances in parallel) to very fine (i.e., 16serverthreads and 16 proxythreads in one protocol instance).

1.4 Contributions

In summary, the contributions of this dissertation are:

• We developed protocols (in Chapter 3) that enable aclienthaving a private regular expression to evaluate on the encrypted file stored at aserver, once authorized to do so by the file owner. Our protocols contribute over prior work by offering the protection of the privacy of the file content against both server and client. More precisely, the protocols protect privacy of the query and file content against arbitrarily maliciousserveradversaries and honest-but-curious clientadversaries.

• In Chapter 4, we present a extension of the protocol developed in Chapter 3 so that, in addition to offering the security guarantees already provided by the original protocol, theclientis able to detect any misbehavior of aserver adversary. Furthermore, it can even tell whether the server input the authentic encrypted file stored there by the file owner during the protocol execution. Consequently, the input and the evaluation result are both authenticated to the client. To our knowledge, this is the first protocol published that considers secure computation on both encrypted and authenticated data in the context of DFA evaluation.

(18)

(19)

CHAPTER 2

Related Work

In this chapter, we discuss research that is related to this dissertation. We discuss general techniques for secure computation in Section 2.1 and then protocols specifically tailored to private DFA evaluation in Section 2.2. Protocols specifically targeted other types of search functionality are discussed in Section 2.3. Previous work on research to ensure that authentic inputs are employed in secure computation protocols is discussed in Section 2.4, and some previous implementation efforts for searching on encrypted data are briefly surveyed in Section 2.5.

2.1 General Techniques for Secure Computation

The problem we study in this dissertation — i.e., privately evaluating a regular expression on the plaintext of an encrypted file stored at a server — could be implemented with general techniques for “computing on encrypted data” [63] or two-party secure computation [74, 37]. These general techniques tend to yield less efficient protocols than one designed for a specific purpose, and our case will be no exception. In particular, the former achieves computations non-interactively using fully homomorphic encryption, for which existing implementations [33, 71, 66, 69] are dramatically more costly than the techniques we use.

(20)

requirement of the protocol is that the communication between the user and theproxy should be minimized. Using our protocol, the communication cost in the direction from the user to theproxy is only dependent on the size of the search query, and is independent of the number and size of the file ciphertexts. We are unaware of how to achieve this property using garbled circuits, however. Since the garbled circuit and its inputs are “unreusable” across different runs of the protocol, the user would need to provide a number of inputs (in this case, encrypted queries) to the proxy that equals to the number of files to be searched. Furthermore, the fact that in our construction, the user-generated encrypted query can be used an unlimited number of times enables a subscription service such that theproxyholds the encrypted query and periodically informs the user of the arrival of matched emails, without any further communication from the user to theproxy. Again, we are unaware of how to implement this functionality using generic garbled-circuit techniques.

An ingredient of our protocols in Chapter 3 and Chapter 5 is a two-party sharing of the data owner’s file-decryption key between a clientholding the search query and theserver holding the encrypted file. By two-party secret-sharing the file-decryption key and using this to compute on encrypted data, our protocols are related to those of Choi et al. [24]. This work developed a pro-tocol based on garbled circuits by which two parties can evaluate a general function after a private decryption key has been shared between them. This protocol can be used to solve the problem we propose, but inherits the aforementioned limitations of garbled circuits.

2.2 Specialized Protocols for DFA Evaluation

(21)

protocol of Blanton and Aliasgari [13] is relevant; they adapted the Troncoso-Pasoriza et al. protocol to an “outsourcing” model in which theclientandserversecret-share the DFA and file, respectively, between two additional hosts that interactively evaluate the DFA on the file without reconstructing either one. While our protocols utilize secret sharing, as well — in our case, of the file owner’s file-decryption key — our protocol shares much less data and does not share theclient’s DFA (or thus require two parties between which to share it) at all. Furthermore, their protocol does not support the asymmetric encryption of the file, which in the encrypted email application we consider in Chapter 5 is the predominant method for preparing a private email for its intended recipient.

2.3 Specialized Protocols for Searching on Encrypted Data

Specialized protocols for performing searches on encrypted files or database relations have also been developed. For example, searchable encryption [67, 36, 16, 22, 28, 6, 19, 9, 49, 62, 47] enables a party holding a file-decryption key to search for attribute values in the ciphertext file stored at an untrusted server. These techniques have been generalized to support more complex queries, notably conjunctive [19], disjunctive [49] and range queries [65] and inner products [49]. Searchable encryption schemes typically achieve non-interactive queries on encrypted files, in part by attaching “tag” information to the ciphertext of each file to enable the query operation. However, broadening the supported search attributes typically requires expanding the tags, and so the sizes of the tags are determined by the richness of the supported queries. In contrast, in our work the file ciphertexts are independent of the DFA(s) to be evaluated (assuming a fixed alphabetΣover which the DFAs are defined), and the computation is performed interactively between the two parties.

Richer forms of pattern-matching and search (though still not encompassing DFA evaluation) have also been studied in the two-party setting, e.g., by Jha et al. [45], Hazay and Lindell [41], Katz and Malka [48], and Hazay and Toft [42]. Again, these works input the plaintext file to one party and so do not directly apply to our setting.

(22)

usually heuristic, without formal definitions and proofs, and we are unaware of any designed to support DFA searches.

2.4 Input Authenticity in Secure Computation

Most work in the area of secure computation generally does not consider the authenticity of the inputs to the protocol. Indeed, the standard definition of security against arbitrarily malicious adversaries for general two-party protocols provides no restrictions on what input a malicious party may use in the protocol as long as he does not deviate from the protocol. The protocol we present in Chapter 4 allows theclientto tell whether theserver actually uses the authentic encrypted data of the data owner as input, in addition to the ability to detect any misbehavior by theserver. In this sense, our protocol provides an authenticated evaluation result to theclient. To our knowledge, ours is the first protocol to consider secure computation on authenticated data in the context of private DFA evaluation. The main area of specialized protocols in which input authenticity has previously been treated has been private intersection of certified sets [20, 27, 26, 68], in which the set elements of each party much be certified by a trusted third party for use in performing the intersection.

2.5 Implementations of Systems That Allow Searching on Encrypted

Data

(23)

CHAPTER 3

Third-Party Private DFA Evaluation on Encrypted

Files in the Cloud

Motivated by the need to outsource file storage to untrusted clouds while still permitting limited use of that data by third parties, in this chapter, we present practical protocols by which a client can evaluate a DFA on an encrypted file stored at a cloud server, once authorized to do so by the file owner. Our protocols provably protect the privacy of the DFA and the file contents from a malicious server and the privacy of the file contents (except for the result of the evaluation) from an honest-but-curious client (and, heuristically, from a malicious client). We introduce our main protocol in Section 3.2 and an improved protocol in Section 3.3. We further present simple techniques to detect client or server misbehavior in Section 3.4. Before that, we first define the studied problem in Section 3.1.

3.1 Problem Description

A deterministic finite automatonMis a tuple_hQ,Σ,δ,qinitiwhereQis a set of|Q|=nstates;

Σis a set (alphabet) of_|Σ_|= msymbols;δ :Q_×Σ_→ Qis a transition function; andqinit is the initial state. (A DFA can also specify a function ∆ : Q → {0,1}, for which∆(q) = 1indicates thatqis an accepting state. We will discuss extensions of our protocols to this case.)

Our goal is to enable aclientholding a DFAMto interact with aserverholding the ciphertext of a file to evaluateMon the file plaintext. More specifically, the client should output the final state to which the file plaintext drives the DFA; i.e., if the plaintext file is a sequence_hσkik∈[ℓ]where[ℓ]

(24)

and the number of statesnin the client’s DFA.1Theclientshould learn nothing else about the file, however, and theservershould learn nothing else about the file or theclient’s DFA.

Because the file exists in the system only in encrypted form, some private-key information must be injected into the protocol to enable a DFA to be evaluated on the file plaintext. Since (only) the data owner holds the private key, one approach would be to involve the data owner in the protocol. However, in keeping with the goals of cloud outsourcing, our protocols require the data owner only to authorize the client to perform DFA evaluations with the server — but not to participate in those evaluations herself. In our protocols, this authorization occurs by the data owner sharing the private file-decryption key between theclientand server. As a result, a client and server that collude could pool their information to decrypt the file. Here we assume no such collusion, however, for two reasons. First, we are primarily motivated by scenarios in which the clientrepresents a partially trusted service provider or customer, and so even if the cloud server were to be compromised, we presume this party would not be the cause. So, we prove security against only aclientorserveracting in isolation and with primary attention to only an honest-but-curiousclient(though we also heuristically justify the security of our protocol against an arbitrary client). Second, even without sharing the file decryption key between the clientand server, the functionality offered by our protocol (i.e., evaluating a DFA on the file) would enable a colluding clientandserverto evaluate arbitrary (and arbitrarily many) DFAs on the file, eventually permitting its decryption anyway. The only defense against collusion that we see would be to involve the data owner in the protocol; again, we do not explore this possibility here.

Another potential form of collusion that we do not explicitly consider here is collusion between the data owner and theserver, presumably to learn the DFA used by the client. In our protocol, however, the protection of DFA privacy does not depend on the security of the data owner’s file-decryption key. Since the data owner is not involved in the protocol, it does not offer theserverany additional leverage in learning theclient’s DFA.

1_{Since exposing the final state reduces file entropy by}_log

2n bits, presumably theserver should learnn so as to

(25)

Our protocols do not retrieve the file based on the DFA evaluation results, e.g., in a way that hides from the server what file is being retrieved. However, once the client learns the final state of the DFA evaluation, it can employ various techniques to retrieve the file privately (e.g., [35]). Moreover, some of our motivating scenarios in Chapter 1, e.g., malware scans of cloud-resident files by a third party, may not require file retrieval but only that matches be reported to the file owner.

3.2 A Secure DFA Evaluation Protocol

In this section we present a protocol that meets the goals described in Section 3.1. We give the construction in Section 3.2.1, and then we define and prove security againstserver and client adversaries in Section 3.2.2 and Section 3.2.3, respectively.

3.2.1 Construction

Let “_←” denote assignment and “s _←$ S” denote the assignment tosof a randomly chosen element of setS. Letκdenote a security parameter.

Encryption scheme Our scheme is built using an additively homomorphic encryption scheme with plaintext spaceR_where_hR_,₊

R,·Ridenotes a commutative ring. Specifically, an encryption scheme E includes algorithmsGen,Enc, andDec where: Genis a randomized algorithm that on input1κ

outputs a public-key/private-key pair(pk,sk) _← Gen(1κ);Enc is a randomized algorithm that on input public keypk and plaintextm∈R(whereRcan be determined as a function ofpk) produces a ciphertextc←Encpk(m), wherec ∈Cpk andCpk is the ciphertext space determined bypk; and Dec is a deterministic algorithm that on input a private keysk and ciphertext c ∈ Cpk produces a plaintextm _← Decsk(c)wherem ∈ R. In addition,E supports an operation +pk on ciphertexts such that for any public-key/private-key pair(pk,sk),Decsk(Encpk(m1)+pkEncpk(m2)) =m1+R

m2. Using+pk, it is possible to implement·pk for whichDecsk(m2·pk Encpk(m1)) =m1·R m2.

We also require_Eto support two-party decryption. Specifically, we assume there is an efficient randomized algorithmSharethat on input a private keysk outputs shares(sk1,sk2) ←Share(sk),

and that there are efficient deterministic algorithmsDec1andDec2such thatDecsk(c) =Dec2sk2(c,

(26)

An example of an encryption scheme_Ethat meets the above requirements is due to Paillier [59] with modifications by Damg˚ard and Jurik [29]; we henceforth refer to this scheme as “Pai”. In this scheme, the ringR_isZ_N _where_N ₌_pp′_and_p,_p′_{are primes, and the ciphertext space}_C_pk _isZ∗

N2.

We use Ppk to denote summation using +_pk; PR to denote summation using +_R; and QR to

denote the product using _·R of a sequence. For any operation op, we usetop to denote the time required to performop; e.g.,tDec is the time to perform aDecoperation.

Encodingδin a Bivariate Polynomial overR _{A second ingredient for our protocol is a method for}

encoding a DFA_hQ,Σ, δ, qiniti, and specifically the transition functionδ, as a bivariate polynomial f(x, y)over Rwherexis the variable representing a DFA state andyis the variable representing an input symbol. That is, if we treat each state q ∈ Qand eachσ ∈ Σas distinct elements ofR_,

then we would likef(q, σ) = δ(q, σ). We can achieve this by choosing f to be the interpolation polynomial

f(x, y) = R

X

σ∈Σ

(fσ(x)·R Λσ(y)) where Λσ(y) = R

Y

σ′_∈Σ σ′₆₌_σ

y−R σ′

σ_−Rσ′ (3.1)

is a Lagrange basis polynomial and fσ(q) = δ(q, σ) for each q ∈ Q. Note that Λσ(σ) = 1and

Λσ(σ′) = 0for anyσ′ ∈Σ\ {σ}.

Calculating Eqn. 3.1 requires taking multiplicative inverses inR_{. While not every element of a}

ring has a multiplicative inverse in the ring, fortunately the ringZ_N _{used in Paillier encryption, for}

example, has negligibly few elements with no inverses, and so there is little risk of encountering an element with no inverse. Using Eqn. 3.1, we can calculate coefficients_hλσjij∈[m]so thatΛσ(y) =

R

Pm−1

j=0 λσj ·R yj. For our algorithm descriptions, we encapsulate this calculation in the procedure

hλσjiσ∈Σ,j∈[m]←Lagrange(Σ).

Eachfσ needed to computef(x, y)can again be determined as a Lagrange interpolating poly-nomial and then expressed asfσ(x) = PRn_i₌₀−1aσi ·R xi. In our pseudocode, we encapsulate this

calculation as_haσiiσ∈Σ,i∈[n]←ToPoly(Q,Σ, δ).

Protocol steps Our protocol, denoted Π1(E), is shown in Fig. 3.1. Pseudocode for the client is

(27)

and labeledm101–m106. Theclientreceives as input a public keypk under which the file (at the server) is encrypted; a sharesk1of the private keysk corresponding topk; another public keypk′;

and the DFA_hQ,Σ, δ, qiniti. Theserverreceives as input the public keypk; a sharesk2of the private

keysk; the alphabet Σ; and ciphertexts ckj ← Encpk((σk)j)of thek-th file symbol σk, for each j ∈[m]and for eachk ∈ [ℓ]whereℓdenotes the file length in symbols. We assume thatsk1and

sk2were generated as(sk1,sk2)←Share(sk). Note that no information aboutsk′(the private key

corresponding topk′) is given to either party, and sopk′ ciphertexts (ρ created inc107and c115 and sent in m103 andm105, respectively) are indecipherable and ignored in the protocol. These ciphertexts are included to simplify the proof of privacy against clientadversaries (Section 3.2.3) and can be elided in practice. We do not discuss these values further in this section.

The protocol is structured as matchingfor loops executed by theclient(c105–c113) andserver (s103–s111). Theclientbegins thek-th loop iteration with an encryptionαof the current DFA state after being blinded by a random injectionπ1 :Q→ Rit chose in the(k−1)-th loop at linec109

(or, ifk= 0, then in linec103), whereInjs(Q→R₎denotes the set of injections fromQtoR. The clientuses its sharesk1ofsk to create the “partial decryption”βofα(c106) and sendsα,βto the

server(m103). Theserver uses its sharesk2to complete the decryption ofα to obtain the blinded

stateγ(s104). We stress that becauseγis blinded byπ1,γreveals no information about the current

DFA state to theserver. Theserver then computes, for eachσ _∈ Σ(s105), a value Ψσ such that

Λσ(σk) =Decsk(Ψσ)(s106) by utilizing coefficientshλσjiσ∈Σ,j∈[m]output fromLagrange(s102).

Theserverthen returns (inm104) values_hµσiiσ∈Σ,i∈[n]created so thatDecsk(µσi) = γi·R Λσ(σk) (s108).

Meanwhile, the client selects a new random injection π1 ←$ Injs(Q → R) (c109). The

clientthen constructs a new DFA transition functionδ′ reflecting the injection it chose in the last round (now denotedπ0, see linec108) and the new injectionπ1it chose for this round. Specifically,

it creates a new DFA state transition function δ′ defined as δ′(q, σ) = π1(δ(π₀−1(q), σ))for all

σ ∈ Σand q ∈ π0(Q)whereπ0(Q) = {π0(q)}q∈Q; we denote this step asδ′ ← Blind(δ, π0, π1)

in linec110. That is, δ′ _{“undoes” the previous injection} _π

0, applies δ, and then applies the new

injectionπ1. Theclientthen interpolates a bivariate polynomialf(x, y)such thatf(q, σ) =δ′(q, σ)

(28)

client(pk,sk1,pk′,hQ,Σ, δ, qiniti) server(pk,sk2,Σ,hckjik∈[ℓ],j∈[m])

c101. n← |Q|, m← |Σ| s101. m← |Σ|

c102. π0 ←I s102. hλσjiσ∈Σ,j∈[m]

c103. π1 ←$ Injs(Q→R) ←Lagrange(Σ)

c104. α←Encpk(π1(qinit))

m101. n ✲

m102. ✛ ℓ

c105. fork_←0. . . ℓ₋1 s103. fork_←0. . . ℓ₋1

c106. β ←Dec1_sk₁(α)

c107. ρ_←Enc_pk′(π₁)

m103. α,β,ρ ✲

s104. γ _←Dec2_sk₂(α, β)

c108. π0←π1 s105. forσ ∈Σ

c109. π1 ←$ Injs(Q→R) s106. Ψσ ← pk

m−1

X

j=0

λσj·pk ckj

c110. δ′ _←Blind(δ, π0, π1) s107. fori∈[n]

c111. _haσiiσ∈Σ,i∈[n] s108. µσi←γi·pk Ψσ

←ToPoly(Q,Σ, δ′) s109. endfor

s110. endfor

m104. ✛hµσiiσ∈Σ,i∈[n] c112. α← pk

X

σ∈Σ

pk

n−1

X

i=0

aσi·pk µσi

c113. endfor s111. endfor

c114. β_←Dec1_sk₁(α)

m105. α,β,ρ ✲

s112. γ∗ _←_Dec2

sk2(α, β)

m106. ✛ γ∗ c116. returnπ₁−1(γ∗)

(29)

hµσiiσ∈Σ,i∈[n]sent from the server (messagem103) to assemble a ciphertextαof the new DFA state

under the injectionπ1(c112).

Afterℓloop iterations, theclientinteracts with theserver once more to decrypt the final state. It sends α and its partial decryption β to theserver (m105), for which the server completes the decryption (s112) and returns the result (m106).

Protocol Π1(E) can be modified to return only a binary indication of whether the DFA’s final

state is an accepting one, if the DFA specifies a function∆indicating whether a state is an accepting state. Specifically, theclientcan construct a polynomialF(x)such thatF(q) = 1if∆(q) = 1and F(q) = 0otherwise, forq _∈ Q. Then, rather than interacting with theserverto decrypt the final state, theclientcan interact with theserveronce to evaluateF(x)on the (unknown) final state and again to decrypt this result.

For brevity, Fig. 3.1 omits numerous checks that theclientandservershould perform to confirm that the values each receives are well-formed. For example, theclientshould confirm thatµσi∈Cpk for eachσ∈Σandi∈[n], upon receiving these inm104. Theserver should similarly confirm the well-formedness of the values it receives.

An Alternative Using Fully Homomorphic Encryption Our technique of encoding the DFA transition function δ using a bivariate polynomial f(x, y) over R could also be used with fully homomorphic encryption [33, 71] to create a noninteractive protocol. Theclientcould encrypt each coefficient aσi offunder the public key pk and send these ciphertexts to the server, enabling the server to perform computationsc112by itself. At the end, theserver could send a half decrypted final state back to theclient, who would complete the decryption to obtain the result. This protocol achieves communication costs ofO(nm), which is independent of the file length. That said, existing fully homomorphic schemes are far less efficient than additively homomorphic schemes, and so the resulting protocol will be less communication-efficient thanΠ1(E)for many practical file lengths

and DFA sizes.

3.2.2 Security Against Server Adversaries

(30)

file in its possession. That is, we show only theprivacy of the file and DFA inputs againstserver adversaries. In this section, we are not concerned with showing that a client can detect server misbehavior, a property often calledcorrectness. Π1(E)could be augmented using standard tools

to enforce correctness, with an impact on performance; we do not explore this here. Instead, in Section 3.4 we describe novel extensions toΠ1(E)that could be used to detectservermisbehavior.

We formalize our claims againstservercompromise by defining two separateserveradversaries. The firstserveradversaryS= (S1, S2)attacks the DFAM =hQ,Σ,δ,qinitiheld by theclient, as described in experimentExpts-dfa

Π1(E)in Fig. 3.2.S1first generates a filehσkik∈[ℓ]and two DFAsM0,

M1. (Note that we use, e.g., “M0.Q” and “M1.Q” to disambiguate their state sets.)S2then receives

the ciphertexts _hckjik∈[ℓ],j∈[m]of its file, information φcreated for it byS1, and oracle access to

clientOr(pk,sk1,pk′,Mb)forbchosen randomly. ExperimentExpts-dfa

Π1(E)(S1, S2)

(pk,sk)←Gen(1κ) (sk1,sk2)←Share(sk) (pk′,sk′)_←Gen(1κ₎

(ℓ,hσkik∈[ℓ], M0, M1, φ)←S1(pk,sk2) if M0.Q6=M1.QorM0.Σ6=M1.Σ

then return0

b ← {$ 0,1}

m_{← |}Mb.Σ|

fork_∈[ℓ], j_∈[m]

ckj ←Encpk((σk)j) b′ _←SclientOr(pk,sk1,pk′,Mb)

2 (φ,hckjik∈[ℓ],j∈[m]) if b′₌_b

then return1 else return0

Figure 3.2: Experiments for proving DFA privacy ofΠ1(E)againstserveradversaries

clientOr responds to queries fromS2 as follows, ignoring malformed queries. The first query

(say, consisting of simply “start”) causes clientOrto begin the protocol; clientOrresponds with a message of the form n(i.e., of the form ofm101). The second invocation byS2 must include a

single integer ℓ(i.e., of the form of m102); clientOr responds with a message of the form α, β, ρ, i.e., three values as in m103. The nextℓ₋1queries byS2 must containnmelements ofCpk, i.e.,_hµσiiσ∈Σ,i∈[n] as inm104, to whichclientOr responds with three values as in messagem103.

(31)

ExperimentExptind_E -cpa(U) ( ˆpk,skˆ)_←Gen(1κ₎

ˆ_b $

← {0,1_}

ˆ_b′ _←_UEncˆbpkˆ(·,·)( ˆ_pk)

ifˆb′= ˆb

Figure 3.3: Experiment for IND-CPA security

responds with three values as inm105. The next (and last) query byS2can consist simply of a value

inR_{, as in message}_m106.

Eventually S2 outputs a bit b′, and ExptΠs-1dfa(E)(S) = 1 only if b

′ ₌ _{b. We say the}

advan-tageofS isAdvs_Π-₁dfa_(E)(S) = 2·P

³

Expts_Π-₁dfa_(E)(S) = 1´−1and defineAdvs_Π-₁dfa_(E)(t, ℓ, n, m) = maxSAdvsΠ-1dfa(E)(S)where the maximum is taken over all adversariesStaking timetand selecting

a file of lengthℓand DFAs containingnstates and an alphabet ofmsymbols.

We reduce DFA privacy against server attacks to the IND-CPA [10] security of the encryption scheme. IND-CPA security is defined using the experiment in Fig. 3.3, in which an adversary U is provided a public key pkˆ and access to an oracle Encˆb_ˆ

pk(·,·) that consistently encrypts either the first of its two inputs (ifˆb = 0) or the second of those inputs (ifˆb = 1). Eventually U out-puts a guess ˆb′ at ˆb, and Expt_Eind-cpa(U) = 1 only if ˆb′ = ˆb. The IND-CPA advantage of U is defined as Advind_E -cpa(U) = 2_·P

³

Exptind_E -cpa(U) = 1´₋1. Then, Advind_E -cpa(t, w) = maxUAdvind_E -cpa(U)where the maximum is taken over all adversariesU executing in timetand makingwqueries toEncˆb_ˆ

pk(·,·).

Our theorem statements throughout this paper omit terms that are negligible as a function of the security parameterκ.

Theorem 1. Fort′=t+tGen+tShare+ℓm·tEnc,

Advs-dfa

Π1(E)(t, ℓ, n, m) ≤2Adv

ind-cpa

E (t′, ℓ+ 1)

Proof. Let S be an adversary meeting the parameters t, ℓ, n, and m. Consider a simulation

(32)

ExperimentExpts_Π-₁file_(E)(S1, S2) (pk,sk)_←Gen(1κ₎

(sk1,sk2)←Share(sk) (pk′,sk′)_←Gen(1κ)

(ℓ,_hσ0kik∈[ℓ],hσ1kik∈[ℓ], M, φ)←S1(pk,sk2)

b _{← {}$ 0,1_}

m← |M.Σ| fork_∈[ℓ], j _∈[m]

ckj ←Encpk((σbk)j) b′←SclientOr(pk,sk1,pk′,M)

2 (φ,hckjik∈[ℓ],j∈[m]) ifb′ ₌_b

Figure 3.4: Experiments for proving file privacy ofΠ1(E)againstserveradversaries

c109(i.e., ρ _← Enc_pk′(π),π ←$ Injs(Q → R)inc107and c115). Thenbis hidden information-theoretically from S in Sims_Π-₁dfa_(E), since γ is a random element of R _in_s104 _{and since} _γ∗ _{is a}

random element ofR_(see_{c109). As a result,}P

³

Sims_Π-₁dfa_(E)(S) = 1´= ₂1 and forAdvs_Π-₁dfa_(E)(S)to be nonzero,Smust distinguishSims_Π-₁dfa_(E)fromExpts_Π-₁dfa_(E).

We construct an IND-CPA adversaryUthat, on inputpkˆ, setspk′←pkˆ and uses its own oracle Encˆb_ˆ

pk to choose between running Expt s-dfa

Π1(E) andSim

s-dfa

Π1(E) forS by settingρ ← Enc

ˆ_b ˆ

pk(0, r)in c107andc115. (Aside from this,UperformsExpts-dfa

Π1(E)faithfully, using(pk,sk)←Gen(1

κ₎_and

(sk1,sk2) ← Share(sk)it generates itself.) Uthen returnsbˆ′ = 1ifS2outputsb′ =bandˆb′ = 0,

otherwise. Then,

P

³

Exptind_E -cpa(U) = 1´

= 1

2P

³

Expts_Π-₁dfa_(E)(S) = 1´+1 2P

³

Sims_Π-₁dfa_(E)(S) = 0´

= 1 2 µ 1 2+ 1 2Adv

s-dfa

Π1(E)(S) ¶ +1 4 = 1 2 + 1 4Adv

s-dfa

Π1(E)(S)

and soAdvind_E -cpa(U) = 1₂Advs-dfa

Π1(E)(S).

Note thatUmakesℓ+ 1oracle queries and runs in timet′ ₌_t₊_t

(33)

The second server adversary S = (S1, S2) attacks the file ciphertexts hckjik∈[ℓ],j∈[m] as in

experimentExpts_Π-₁file_(E) shown in Fig. 3.4. S1 produces two equal-length plaintext fileshσ0kik∈[ℓ], hσ1kik∈[ℓ] and a DFAM. S2 receives the ciphertexts hckjik∈[ℓ],j∈[m] for file hσbkik∈[ℓ] where b

is chosen randomly. S2 is also given oracle access toclientOr(pk, sk1, pk′, M). EventuallyS2

outputs a bitb′, andExpts_Π-₁file_(E)(S) = 1iffb′ =b. We say theadvantageofSisAdvs_Π-₁file_(E)(S) = 2·P³_Expts-file

Π1(E)(S) = 1 ´

−1and thenAdvs-file

Π1(E)(t, ℓ, n, m) = maxSAdv

s-file

Π1(E)(S)where the

maximum is taken over all adversariesS= (S1, S2)taking timetand producing (fromS1) files of

ℓsymbols and a DFA ofnstates and alphabet of sizem. We prove the following theorem:

Theorem 2. Fort′=t+tGen+tShare+ℓm·tEnc,

Adv_Πs-₁file₍_Pai₎(t, ℓ, n, m)_≤2Adv_Paiind-cpa(t′, ℓ+ 1) +Advind_Pai-cpa(t′, ℓm)

Proof. LetExpts-file-0

Π1(Pai) denote experimentExpt

s-file

Π1(Pai)withbfixed atb= 0, and letExpt

s-file-1 Π1(Pai)

denote the experimentExpts-file

Π1(Pai) withbfixed at b = 1. Consider a simulationSim

s-file-0 Π1(Pai) for

Expts-file-0

Π1(Pai) that differs only by simulating clientOr so as to substitute all ciphertexts produced

with pk′ with encryptions of a random injection π independent of π1 it chose as in c109 (i.e.,

ρ _← Enc_pk′(π), π ←$ Injs(Q → R) inc107 and c115). Proceeding as in the proof of Theo-rem 1, we construct an IND-CPA adversary U0 that uses its own oracleEncˆbpkˆ to choose between

running Expts_Π-₁file₍_Pai₎-0 and Sims_Π-₁file₍_Pai-₎0 forS,i.e., by settingpk′ ← pkˆ andρ ← Enc

ˆ_b ˆ

pk(π1, π)in c107and c115. (Aside from this,U0 performsExptΠs-1file(Pai-)0 faithfully, using(pk,sk) ←Gen(1

κ₎

and(sk1,sk2) ← Share(sk)it generates itself.) U0 returnsˆb′ = 0ifb′ = bandˆb′ = 1, otherwise.

Then,

1 +Advind_Pai-cpa(U0) = 2·P

³

Exptind_Pai-cpa(U0) = 1

´

=

P³_Expts-file-0

Π1(Pai) (S) = 1 ´

+P³_Sims-file-0

Π1(Pai) (S) = 0 ´

(3.2)

Now consider a simulationSim_Πs-₁file₍_Pai-₎1forExpts-file-1

Π1(Pai) that again differs only by simulating

clientOrso as to substitute all ciphertexts produced withpk′with encryptions of a random injection. As above, we construct an IND-CPA adversaryU1that uses its own oracleEncˆbpkˆ to choose between

running Expts_Π-₁file₍_Pai₎-1 and Sims_Π-₁file₍_Pai₎-1 for S, i.e., by setting pk′ ← pkˆ and ρ ← Enc

ˆ_b ˆ

(34)

whereπ ←$ Injs(Q → R₎_in_c107_and_c115. _U₁ _returnsˆ_b′ _{= 1}_if_b′ ₌_b_andˆ_b′ _{= 0}_{, otherwise.}

Then,

1 +Advind_Pai-cpa(U1) = 2·P

³

Exptind_Pai-cpa(U1) = 1

´

=

P³_Sims-file-1

Π1(Pai) (S) = 0 ´

+P³_Expts-file-1

Π1(Pai) (S) = 1 ´

(3.3)

Finally, consider an adversaryUthat uses its oracleEncˆb_ˆ

pkto choose between runningSim s-file-0 Π1(Pai)

andSims_Π-₁file₍_Pai-₎1 forS. Specifically, on inputpkˆ =_hN, g_i,U generatesd2 ←$ ZN2 and invokes

S1( ˆpk,sk2) where sk2 = hN, g, d2i. Upon receiving hσ0kik∈[ℓ] and hσ1kik∈[ℓ] from S1, U sets

ckj ← Enc

ˆ_b ˆ

pk((σ0k) j_,₍_σ

1k)j). Additionally, in the simulation of clientOr, U selects r ←$ R and setsα ← Enc_pk_ˆ(r) inc104 and c112and β ← grα−d2 _mod_N2 _in_c106_and _{c114, so that}

αd2_β _≡_gr_mod_N2_{. (U} _{also generates} _pk′ _{itself and constructs all encryptions for}_pk′ _as

encryp-tions of a random injection.) WhenS2outputsb′,Uoutputsb′ asˆb′. Then,

1 +Advind_Pai-cpa(U) =

2_·P³_Exptind-cpa

Pai (U) = 1

´

= 2_·P³_Sims-file

Π1(Pai)(S) = 1 ´

=

P

³

Sim_Πs-₁file₍_Pai-₎0(S) = 1´+P

³

Sims_Π-₁file₍_Pai-₎1(S) = 1´ (3.4)

Adding (3.2), (3.3) and (3.4), we get

3 +Advind_Pai-cpa(U0) +AdvindPai-cpa(U) +Adv

ind-cpa

Pai (U1)

=P³_Expts-file-0

Π1(Pai) (S) = 1 ´

+P³_Sims-file-0

Π1(Pai) (S) = 0 ´

+P

³

Sim_Πs-₁file₍_Pai-₎0(S) = 1´+P

³

Sims_Π-₁file₍_Pai-₎1(S) = 1´

+P³_Sims-file-1

Π1(Pai) (S) = 0 ´

+P³_Expts-file-1

Π1(Pai) (S) = 1 ´

= 2_·P³_Expts-file

Π1(Pai)(S) = 1 ´

+ 2

= 3 +Advs_Π-₁file₍_Pai₎(S)

The result then follows because each of U0 and U1 makes ℓ+ 1oracle queries and runs in time

(35)

ExperimentExptc_Π-₁file_(E)(C1, C2) (pk,sk)_←Gen(1κ₎

(sk1,sk2)←Share(sk) (pk′,sk′)_←Gen(1κ+2)

(ℓ,_hσ0kik∈[ℓ],hσ1kik∈[ℓ], M, φ)←C1(pk,sk1,pk′) if M(hσ0kik∈[ℓ])=6 M(hσ1kik∈[ℓ])then return0

b _{← {}$ 0,1_}

m_{← |}M.Σ_| fork∈[ℓ], j ∈[m]

ckj ←Encpk((σbk)j)

b′ ←CserverOr(pk,sk2,M.Σ,hckjik∈[ℓ],j∈[m])

2 (φ)

if b′ =b

Figure 3.5: Experiment for proving file privacy ofΠ1(E)againstclientadversaries

ciphertexts_hckjik∈[ℓ],j∈[m].Umakesℓmoracle queries and runs in timet+tGen+tShare+ℓm·tEnc for the same reason.

3.2.3 Security Against Client Adversaries

In this section we show security of Π1(E) against honest-but-curious client adversaries and

heuristically justify its security against malicious ones. Since theclienthas the DFA in its posses-sion, privacy of the DFA against aclientadversary is not a concern. The proof of security against theclienttherefore is concerned with the privacy of only the file. However, by the nature of what the protocol computes for theclient— i.e., the final state of a DFA match on the file — theclient can easily distinguish two files of its choosing simply by running the protocol correctly using a DFA that distinguishes between the two files it chose.

For this reason, we adapt the notion of indistinguishability to apply only to files that produce the same final state for the client’s DFA. So, in the experiment Exptc-file

Π1(E) (Fig. 3.5) that we

use to define file security against clientadversaries, the adversary C = (C1, C2) succeeds (i.e., Exptc-file

Π1(E)(C)returns1) only if the two fileshσ0kik∈[ℓ]andhσ1kik∈[ℓ]output byC1both drive the

DFAM, also output byC1, to the same final state (denotedM(hσ0kik∈[ℓ]) =M(hσ1kik∈[ℓ])).

This caveat aside, the experiment is straightforward: C1 receives public key pk, private-key

(36)

hσ0kik∈[ℓ]and hσ1kik∈[ℓ]and a DFA M. Depending on how bis then chosen, one of these files is

encrypted using pk and then provided to theserver, to which C2 is given oracle access (denoted

serverOr(pk,sk2, M.Σ,hckjik∈[ℓ],j∈[m])).

Adversary C2 can invoke serverOr first with a message containing an integer n(i.e., with a

message of the formm101), to whichserverOrreturnsℓ(m102). C2 can then invoke serverOrup

toℓ+ 1times. The firstℓsuch invocations take the formα,β,ρand correspond to messages of the formm103. Each such invocation elicits a response_hµσiiσ∈Σ,i∈[n](i.e., of the formm104). The last

clientinvocation is of the formα,β,ρand corresponds tom105. This invocation elicits a response γ∗ _(i.e.,_{m106). Malformed or extra queries are rejected by}_serverOr.

We show file privacy against honest-but-curious clientadversaries C = (C1, C2), i.e.,C2

in-vokesserverOrexactly asΠ1(E)prescribes, using DFAM output byC1. We define the advantage

ofCto behbcAdvc_Π-₁file_(E)(C) = 2_·P

³

Exptc_Π-₁file_(E)(C) = 1´₋1andhbcAdvc_Π-₁file_(E)(t, ℓ, n, m) = maxCAdvcΠ-1file(E)(C) where the maximum is taken over honest-but-curious client adversaries C

running in total timetand producing files of lengthℓand a DFA ofnstates over an alphabet ofm symbols. We prove:

Theorem 3. Fort′=t+tGen+ℓm·tEnc+ (ℓ+ 1)·tDec,

hbcAdvc_Π-₁file₍_Pai₎(t, ℓ, n, m)_≤Advind_Pai-cpa(t′, ℓm(1 +n))

Proof. Given an adversaryC = (C1, C2)running in timetand selecting files of lengthℓsymbols

and a DFA ofnstates over an alphabet ofmsymbols, we construct an IND-CPA adversaryU that demonstrates the theorem as follows. On inputpkˆ =hN, gi,U generates(pk′,sk′)← Gen(1κ+2)

and d1 ←$ ZN2, and invokes C₁( ˆpk,sk₁,pk′)where sk₁ = hN, g, d₁i to obtain(ℓ,hσ₀_ki_k_∈[_ℓ_],

hσ1kik∈[ℓ],M,φ), whereM =hQ,Σ,qinit,δiis a DFA. Note thatd1 is chosen from a distribution

that is statistically indistinguishable from that from which d1 is chosen in the real system. For

k_∈[ℓ]andj_∈[m],Usetsckj ←Enc_pkˆbˆ((σ0k)j,(σ1k)j).

U then invokesC2(φ) and simulates responses toC2’s queries to serverOr as follows

(37)

γ1 ←π(q1), and then setsµσi←Enc

ˆ

b

ˆ

pk(((γ0) i_·

RΛσ(σ0k),((γ1)

i_·

RΛσ(σ1k))forσ∈Σandi∈[n].

After this, Uupdatesq0 ←δ(q0, σ0k)andq1 ←δ(q1, σ1k), and returnshµσi,iσ∈Σ,k∈[n]toC2. For

the last queryα, β, ρ, adversaryUcomputesπ ←Dec_sk′(ρ)and returnsγ∗ =π(q₀) (=π(q₁))to C2. WhenC2outputsb′,Uoutputsb′, as well.

This simulation is statistically indistinguishable from the real system provided thatCis honest-but-curious, and so ignoring terms that are negligible inκ,hbcAdvc_Π-₁file₍_Pai₎(C) =Advind_Pai-cpa(U). Note thatUruns int′ ₌_t₊_t

Gen+ℓm·tEnc+ (ℓ+ 1)·tDec due to the need to generate(pk′,sk′) and sk1, to create the file ciphertexts hckjik∈[ℓ],j∈[m], and to perform ℓ+ 1 Paidecryption in the

simulation.Umakesnmoracle queries in order to respond to each of theℓoracle queries following the first, plus an additionalℓmqueries to create_hckjik∈[ℓ],j∈[m].

We have found extending this result to fully maliciousclientadversaries to be difficult for two reasons. First,Exptc_Π-₁file_(E) does not make sense for a maliciousclient, sinceC2is not bound to use

the DFAMoutput byC1. As such,C2can use a different DFA — in particular, one that enables it

to distinguish between the files output byC1. Second, even ignoring the final stateγ∗sent back to

theclient, we have been unable to reduce the ability of theclientadversary to distinguish between two files on the basis of m104 messages to breaking the IND-CPA security of _E; intuitively, the difficulty derives from the simulator’s inability to decryptαvalues provided byC2. (The ciphertext

ρ enables the simulator to “track” the plaintext of α in the honest-but-curious case, but ρ might

contain useless information in the malicious case.)

(38)

3.3 An Alternative Protocol

The second protocol we present has the same goals asΠ1(E) but incurs less communication

costs. Specifically, whereas the communication cost of Π1(E) isO(κℓnm) bits, the protocol we

present in this section, called Π2(E), sends only O(κℓ(n+m)) bits. Π2(E) accomplishes this

in part by exploiting a cryptosystem that is additively homomorphic and that offers the ability to homomorphically “multiply” ciphertexts once. That is, the cryptosystem supports a new operator

⊙pk that satisfiesDecsk(Encpk(m1)⊙pkEncpk(m2)) =m1·Rm2, but the result of a⊙pk operation (or any other ciphertext resulting from+pk or·pk operations in which it is used) cannot be used in a

⊙pk operation. After we present our protocol, we will discuss various options for instantiating this

encryption scheme within it.

Protocol Π2(E) is shown in Fig. 3.6. Note that the input arguments to both theclientand the

server are identical to those inΠ1(E). The structure of the protocol is also very similar toΠ1(E),

with the only differences being in how the server performs each loop iteration (s204–s212) and how the client forms the new encrypted DFA stateα(c212–c216). We now summarize the primary innovations represented by these differences.

After thek-thm203message, theserverconstructs an encryptionΨσofΛσ(σk)(s206). Rather than computingµσi ← γi·pk Ψσ, however, theserversendshΨσiσ∈Σ to theclientinm204. Each

µσiis then built at theclient, instead (c212–c214), which is the main reason we get better commu-nication efficiency.

Since eachµσiis built at theclient, theserver must sendγ inm204. To hide the current DFA state from theclient, theserver blindsγ with a randomr _∈R_{(s208–s209) before returning it. So,}

theclientneeds to accommodate r without knowing it when performing the DFA state transition. Theclientcannot perform the polynomial evaluation using thef(x, y)it constructed (c211) on the

hµσiiσ∈Σ,i∈[n]as inΠ1(E)sincef(x, y)is designed for an inputq ∈π0(Q), notq+r. To overcome

this, the client constructs a shifted polynomial f′(x, y) such that f′(q +r, σ) = f(q, σ) for all q _∈π0(Q), and sof′(x, y)will correctly translate the blinded input to the next DFA state. What is

(39)

client(pk,sk1,pk′,hQ,Σ, δ, qiniti) server(pk,sk2,Σ,hckjik∈[ℓ],j∈[m])

c201. n_{← |}Q_|, m_{← |}Σ_| s201. m_{← |}Σ_|

c202. π0 ←I s202. hλσjiσ∈Σ,j∈[m]

c203. π1 ←$ Injs(Q→R) ←Lagrange(Σ)

c204. α_←Encpk(π1(qinit))

m201. n ✲

m202. ✛ ℓ

c205. fork_←0. . . ℓ₋1 s203. fork_←0. . . ℓ₋1

c206. β _←Dec1_sk₁(α)

c207. ρ←Enc_pk′(π₁)

m203. α,β,ρ ✲

s204. γ ←Dec2_sk₂(α, β)

c208. π0←π1 s205. forσ∈Σ

c209. π1 ←$ Injs(Q→R) s206. Ψσ ← pk

m−1

X

j=0

λσj·pk ckj

c210. δ′ _←Blind(δ, π0, π1) s207. endfor

c211. _haσiiσ∈Σ,i∈[n] s208. r $

← R

←ToPoly(Q,Σ, δ′₎ _s209. _γ _←_γ₊ R r

s210. fori∈[n]

s211. νi ←Encpk(ri) s212. endfor

m204. ✛γ,hΨσiσ∈Σ,hνiii∈[n] c212. forσ _∈Σ, i_∈[n]

c213. µσi←γi·pk Ψσ c214. endfor

c215. _hˆa′

σiiσ∈Σ,i∈[n]

←Shift(_hνiii∈[n],haσiiσ∈Σ,i∈[n])

c216. α_← pk X

σ∈Σ

pk

n−1

X

i=0 ˆ

a′_σi_⊙pk µσi

c217. endfor s213. endfor

c218. β _←Dec1_sk₁(α)

m205. α,β,ρ ✲

s214. γ∗ _←_Dec2

sk2(α, β)

m206. ✛ γ∗ c220. returnπ−1₁ (γ∗)

(40)

If we setf′(x, y) = PR_σ_∈Σ(f_σ′(x)·R Λ_σ(y))wheref_σ′(x) = PR_in₌₀−1a′_σi·R xi, then it suffices

iff_σ′(x+_Rr) =fσ(x)for allσ∈Σ. Note that

fσ(x−R r) = R

n−1

X

i=0

aσi·R(x−Rr)i = R

n−1

X

i=0

aσi·R R

i

X

i′₌₀

µ

i i′

¶

·R xi−i′ ·R (−Rr)i′ (3.5)

= R n−1 X i=0 Ã R

n−1−i

X

i′₌₀

a_σ₍_i₊_i′₎·R

µ

i+i′

i′

¶

·R(_−Rr)i′

!

·Rxi

where Eqn. 3.5 follows from the binomial theorem. Therefore, setting

a′_σi _← R

n−1−i

X

i′₌₀

a_σ₍_i₊_i′₎·R

µ

i+i′ i′

¶

·R (_−R1)i′_·Rri′ (3.6)

ensuresf_σ′(x+_Rr) =fσ(x)and sof′(x+_R r, σ) =f(x, σ). The clientknows all the terms in Eqn. 3.6 except ri′

. That is exactly the reason the server sends in m204 the ciphertext νi of ri, for each i ∈ [n](see s211). The clientcan then calculate a ciphertext ˆa′

σi of the coefficient ofxi in fσ′ by using the additive homomorphic property of the encryption scheme :

ˆ

a′_σi← pk

n−1−i

X

i′₌₀

µ

aσ(i+i′₎·R

µ

i+i′

i′

¶

·R (−R1)i′

¶

·pk νi′ (3.7)

In our pseudocode, the calculations Eqn. 3.7 are encapsulated within the operation _hˆa′_σi_i_σ_∈Σ_,i_∈[_n_]

←Shift(_hνiii∈[n],haσiiσ∈Σ,i∈[n])on linec215.

After theclientobtains_haˆ′_σiiσ∈Σ,i∈[n]andhµσiiσ∈Σ,i∈[n], it performs polynomial evaluation at

stepc216to assemble the ciphertext of the next DFA state by taking advantage of the one multipli-cation homomorphism of the cryptosystem. This is where the additional homomorphism helps to achieve much better communication complexity.

The privacy of the file and DFA fromserver adversaries and the privacy of the file fromclient adversaries can be proved for Π2(E) very similarly to how they are proved for Π1(E). In fact,

Theorems 1–3 hold for Π2(E)unchanged, once instantiated with a suitable encryption scheme E.

(41)

Instantiating _E Protocol Π2(E) requires an additively homomorphic encryption scheme E that

also supports the “one time” homomorphic multiplication operator _⊙pk. Perhaps the most well-known such cryptosystem is due to Boneh, Goh and Nissim [17], and moreover, this cryptosystem also supports two-party decryption with a cost comparable to regular decryption [17]. The primary difficulty in instantiating _E with this cryptosystem, however, is that decryption — and specifically inΠ2(E), the operationDec2sk2 — requires computing a discrete logarithm in a large group, which

is generally intractable. That said, if the ciphertext is known to encode one of a small number of possible plaintexts, thenDec2_sk₂ can be adapted to test the ciphertext for each of these plaintexts efficiently. As such, to adaptΠ2(E)to employ this cryptosystem, we can augment messagesm203

andm205withπ1(Q)(listed in random order), for the injection π1at the time the message is sent.

This would permit the server to perform Dec2_sk₂(α, β) in lines s204, s214by testing for these n possible plaintexts. It does, however, have the unfortunate side effect of enabling our proofs for the analogs of Theorems 1 and 2 forΠ2(E)to go through only for honest-but-curiousserveradversaries. Π2(E)instantiated in this way still appears to be secure even against maliciousserver adversaries,

though at this point we can claim this only heuristically.

Two other possibilities for instantiating _E in Π2(E) are due to Gentry, Halevi and

Vaikun-tanathan [34]2 and Lauter, Naehrig, and Vaikuntanathan [51]. The primary challenge posed by these cryptosystems is that two-party decryption algorithms for them have not been investigated. Each of these schemes is amenable to sharing its private key securely, after which decryption can be performed using generic two-party computation [74, 7]. These instantiations retain Π2(E)’s

provable security against malicious serveradversaries (i.e., the analogs of Theorems 1 and 2), but

Π2(E)instantiated this way may be less cost-efficient thanΠ1(Pai)for many values ofnandm.3

Of course, customized two-party decryption algorithms for these cryptosystems could restore the efficiency ofΠ2(E), suggesting a useful open problem for the community.

2

Because we require the plaintext ring to be commutative, we would restrict the plaintext space of the Gentry et al. cryptosystem to diagonal square matrices, versus the arbitrary square matrices over which it is defined.

3_{For example, for the Gentry et al. scheme, a “garbled” arithmetic circuit [7] for secure two-party decryption using}