Privacy-Preserving Regular Expression Evaluation on Encrypted Data
Lei Wei
A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science.
Chapel Hill 2013
Approved by:
Christian Cachin
Philip Mackenzie
Fabian Monrose
Michael Reiter, Chair
©2013 Lei Wei
ABSTRACT
LEI WEI: Privacy-preserving Regular-Expression Evaluation on Encrypted Data (Under the direction of Michael K. Reiter)
Motivated by the need to outsource file storage to untrusted clouds while still permitting con-trolled use of that data by authorized third parties, in this dissertation we present a family of proto-cols by which a client can evaluate a regular expression on an encrypted file stored at a server (the cloud), once authorized to do so by the file owner. We present a protocol that provably protects the privacy of the regular expression and the file contents from a malicious server and the privacy of the file contents (except for the evaluation result) from an honest-but-curious client. We then extend this protocol in two primary directions. In one direction, we develop a strengthened protocol that enables the client to detect any misbehavior of the server; in particular, the client can verify that the result of its regular-expression evaluation is based on the authentic file stored there by the data owner, and in this sense the file and evaluation result are authenticated to the client.
ACKNOWLEDGEMENTS
The completion of this dissertation would not have been possible without the guidance of my advisor, Prof. Mike Reiter, whom I feel extremely fortunate to have worked with and owe all my gratitude to. Looking back 6 years ago when I accidentally stepped into this field as a newcomer, he has been instrumental in my personal development and helped shape who I am today. Throughout the years, I have been immensely impressed by and benefited from his masterful understanding of the field, endless source of wisdom, careful direction and advise, and sometimes brutal demands to lift me up to his high standards. There is still so much I can learn from him everyday, but I feel like it is time for me to carry with all these qualities learned from him going forward, hopefully to have a positive impact on others.
I would also like to thank Dr. Christian Cachin, Dr. Fabian Monrose, Dr. Philip Mackenzie and Dr. Gene Tsudik for serving on my dissertation committee. I am very grateful for all of them to take time from their busy schedules to hold meetings with me and providing invaluable feedbacks, especially considering the challenge of scheduling across 9 time zones.
Special thanks to Fabian, who is always available for encouragement, a fun chat or simply a game of foosball or pingpong to distract myself from the stress in graduate school.
I would also like to thank my friend, Hao Xu, for his generosity for providing many insightful discussions and assistance on fighting bugs in my code from time to time. I would also like to thank all my friends in the security lab, from whom I benefited through inspiring discussions, helpful critiques to my work and feedbacks that helped me improve my presentation skills.
My gratitude for my girlfriend, Shih Ku, who has accompanied and supported me throughout my time in graduate school, though not always physically close, but always close in heart.
TABLE OF CONTENTS
LIST OF TABLES . . . ix
LIST OF FIGURES . . . x
1 Introduction . . . 1
1.1 Third-Party Private DFA Evaluation on Encrypted Files in the Cloud . . . 2
1.2 Ensuring File Authenticity in Private DFA Evaluation on Encrypted Files in the Cloud. . . 4
1.3 Toward Practical Encrypted Email That Supports Private, Regular-Expression Searches . . . 5
1.4 Contributions . . . 7
2 Related Work . . . 9
2.1 General Techniques for Secure Computation . . . 9
2.2 Specialized Protocols for DFA Evaluation . . . 10
2.3 Specialized Protocols for Searching on Encrypted Data . . . 11
2.4 Input Authenticity in Secure Computation . . . 12
2.5 Implementations of Systems That Allow Searching on Encrypted Data . . . 12
3 Third-Party Private DFA Evaluation on Encrypted Files in the Cloud . . . 13
3.1 Problem Description . . . 13
3.2 A Secure DFA Evaluation Protocol . . . 15
3.2.1 Construction . . . 15
3.2.2 Security Against Server Adversaries . . . 19
3.2.3 Security Against Client Adversaries . . . 25
3.4 Heuristics to Detect Misbehavior . . . 32
4 Ensuring File Authenticity in Private DFA Evaluation on Encrypted Files in the Cloud . . . . 34
4.1 Goals . . . 34
4.2 Private DFA Evaluation on Signed and Encrypted Data . . . 35
4.2.1 Preliminaries . . . 35
4.2.2 Initial Construction Without File Encryption . . . 37
4.2.3 Adding File Encryption . . . 39
4.2.4 Complexity . . . 42
4.2.5 Security Against Server Adversaries . . . 42
4.2.6 Security Against Client Adversaries . . . 53
4.3 On File Updates . . . 56
4.4 Extensions . . . 57
5 Toward Practical Encrypted Email That Supports Private, Regular-Expression Searches . . . 59
5.1 Protocol Design . . . 59
5.1.1 Our Starting Point . . . 61
5.1.2 Our Initial Construction . . . 61
5.2 Optimizations . . . 67
5.2.1 File Representation . . . 67
5.2.2 Pairing Operations . . . 67
5.2.3 Shifting . . . 70
5.2.4 Packing the Result Ciphertexts . . . 71
5.3 Protocol Security . . . 73
5.3.1 Security Against Server Adversaries . . . 73
5.3.2 Security Against Proxy Adversaries . . . 79
5.4 Performance Evaluation . . . 83
5.4.1 Implementation. . . 83
5.4.3 Case Study: Regular Expression Search on Encrypted Emails . . . 87
5.4.3.1 Header Information . . . 87
5.4.3.2 Encoding . . . 88
5.4.3.3 Evaluations . . . 89
6 Conclusion . . . 92
LIST OF TABLES
5.1 Average time spent per email in seconds (numbers in braces are when
LIST OF FIGURES
1.1 Overall framework: data owner stores encrypted data at theserver, with
which an authorizedclientperforms private searches. . . 2
3.1 ProtocolΠ1(E), described in Section 3.2 . . . 18
3.2 Experiments for proving DFA privacy ofΠ1(E)againstserveradversaries . . . 20
3.3 Experiment for IND-CPA security . . . 21
3.4 Experiments for proving file privacy ofΠ1(E)againstserveradversaries . . . 22
3.5 Experiment for proving file privacy ofΠ1(E)againstclientadversaries . . . 25
3.6 ProtocolΠ2(E), described in Section 3.3 . . . 29
4.1 ProtocolΠ3(E), described in Section 4.2 . . . 40
4.2 Experiments for proving DFA and file privacy of Π3(E) against server adversaries . . . 43
4.3 Experiment for proving result authenticity againstserveradversaries . . . 48
4.4 Experiment for defining BCDH problem . . . 49
4.5 Experiment for proving file privacy againstclientadversaries . . . 54
5.1 ProtocolΠ′1(E), described in Section 5.1.1 . . . 60
5.2 ProtocolΠ4(E), described in Section 5.1.2 . . . 65
5.3 Optimized protocolΠ5(E), described in Section 5.2 . . . 72
5.4 Experiments for proving DFA privacy ofΠ5(E)againstserveradversaries . . . 74
5.5 Experiments for proving file privacy ofΠ5(E)againstserveradversaries . . . 77
5.6 Experiments for proving DFA privacy ofΠ5(E)againstproxyadversaries . . . 79
5.7 Experiments for proving file privacy ofΠ5(E)againstproxyadversaries . . . 82
5.8 Time spent per file character in milliseconds, with pairing preprocessing disabled . . . 85
5.9 Time spent per file character in milliseconds, with pairing preprocessing enabled . . . 85
CHAPTER 1
Introduction
Outsourcing file storage to storage service providers (SSPs) and “clouds” can provide signifi-cant savings to file owners in terms of management costs and capital investments (e.g., [58]). How-ever, because cloud storage can heighten the risk of file disclosure, prudent file owners encrypt their cloud-resident files to protect their confidentiality. This encryption introduces difficulties in managing access to these files by third parties, however. For example:
• Third-party service providers who are contracted to analyze files stored in the cloud generally cannot do so if the files are encrypted. For example, periodically “scanning” files to detect new malware, as is common today for PC platforms, cannot presently be performed on encrypted files by a third party.
• With some exceptions (see Chapter 2), third-party customers generally cannot search the files if they are encrypted. Searches on genome datasets, pharmaceutical databases, document corpora, or network logs are critical for research in various fields, but the privacy constraints of these datasets may mandate their encryption, particularly when stored in the cloud.
These difficulties are compounded when the third party views its queries on the files to be sensitive, as well. New malware signatures may be sensitive since releasing them enables attackers to design malware to evade them (e.g., [75]). Customers of datasets in numerous domains (e.g., pharmaceutical research) may view their research interests, and hence their queries, as private.
Data Owner
Cloud Provider (Server)
Service Provider (Client)
Figure 1.1: Overall framework: data owner stores encrypted data at the server, with which an authorizedclientperforms private searches.
motivated by the scenarios above, which in many cases involve pattern matching a file against one or more regular expressions. Regular-expression searches are a widely adopted search primitive in many languages and programming frameworks1 (e.g., see [38]). Multi-pattern string matching
is especially common in analysis of content for malware (e.g., [64, 50]) and also is commonplace in searches on genome data, for example. In fact, there are now a number of available genome databases (e.g., [2, 5]) and accompanying tools for multi-pattern matching against them (e.g., [12]).
1.1
Third-Party Private DFA Evaluation on Encrypted Files in the
Cloud
With the goal of improving privacy in such applications above, in Chapter 3 we develop novel protocols to evaluate a deterministic finite automaton (DFA) of theclient’s choice on the plaintext of the encrypted file and to return the final state to theclientto indicate which, if any, of the patterns encoded in the DFA were matched. We stress that while there is much work on secure two-party
1To be more precise, the term “regular expression” is used in some frameworks in a way that follows but deviates
computation including the specific case of private DFA evaluation on a private file, very few works have anticipated the possibility that the file is available only in encrypted form. This setting will become more common as data-storage outsourcing grows.
The security properties we prove for our protocols include privacy of the DFA and file contents against arbitraryserveradversaries, and privacy of the file (except what is revealed by the evaluation result) against honest-but-curious clientadversaries. Though our proofs are limited to only honest-but-curiousclientadversaries, we also provide heuristic justification for the security of our protocols against arbitraryclientadversaries. Our protocols appear to be extensible with standard techniques to provably protect file privacy against arbitraryclientadversaries, but we stop short of doing so in light of the substantially greater cost it would impose and our motivating scenarios involving third parties that the file owner must authorize and so presumably trusts to some extent. We do, however, discuss efficient heuristics to detect a misbehavingclientorserverthat highlight new opportunities in the cloud storage setting.
A central observation that facilitates our protocols is that a DFA transition function can be encoded as a bivariate polynomial over the ring of an additively homomorphic encryption scheme with which the file characters are encrypted. In our protocols, theclient, who has this polynomial as input, and theserver, who has the encrypted file as input, obliviously perform DFA state transitions by jointly evaluating this polynomial. Neither party learns the current state at any point of the protocol execution; instead, they share the current state at each step, requiring that the polynomial be adapted in each round to accommodate this sharing.
We believe our protocols will be efficient enough for many practical scenarios. They support evaluation of any DFA over an alphabetΣon any file consisting ofℓsymbols drawn fromΣ, and require the file to be stored using ℓm ciphertexts where m = |Σ|. Since m is a multiplicative factor in the storage cost, our protocols are best suited small alphabetsΣ, e.g., bits (m= 2), bytes (m= 256), alphanumeric characters (m= 36), or DNA nucleotides (m= 4for “A”, “C”, “G”, and
“T”). Specifically, in Chapter 3, we first present a protocol that leverages additively homomorphic encryption (e.g., [59]) and transmits(nm+3)ℓ+3ciphertexts to evaluate a DFA ofnstates. We then leverage additively homomorphic encryption that also supportsonehomomorphic multiplication of ciphertexts (e.g., [17]) to construct an improved protocol that transmits only (n+m + 1)ℓ+ 3
noninteractive protocol with a communication cost ofO(nm)fully homomorphic ciphertexts and, in particular, that is independent of the file lengthℓ.
1.2
Ensuring File Authenticity in Private DFA Evaluation on Encrypted
Files in the Cloud
Even though our protocols provide provable privacy guarantees for both the DFA query and file content against arbitrarily maliciousserveradversaries, a maliciousserver could still try to tamper with the evaluation result by deviating from the protocol specification, or even input fraudulent en-crypted files into the protocol to fool theclient. Indeed, though the traditional notion of a protocol secure against an arbitrarily malicious adversary prevent any misbehaviors during protocol execu-tion, it provides no guarantees on whatinput a malicious party may use in the protocol. Protocols for a third-party client to perform private searches on encrypted data in the cloud, while revealing nothing to the cloud server and nothing but the search result to the client, do exist for some types of searches (e.g., [67, 28, 73]). To our knowledge, however, none also enforce that the cloud server employs the data that the data owner stored at the cloud server.
Motivated by this, in Chapter 4 we present a strengthened protocol that allows the clientto detect any misbehavior of theserver, and in particular, to tell whether theserverinput the authentic encrypted file stored there by the data owner. In that sense, the authenticity of the file input by the serverand the integrity of the computation result are both enforced. At the same time, the protocol provably protects the file contents (except for the result of the computation) from an honest-but-curious client (and heuristically from even a malicious client) and provably protects both the file contents and DFA from an arbitrarily maliciousserver. To our knowledge, our protocol is the first example of performing secure DFA computation on both encryptedand authenticated data.
instantiate this intuition, however, it would require much higher computation and communication costs than our protocol. Instead, we introduce a new technique to enforce correct server behavior and the authenticity of the input on which it is allowed to operate, without relying on zero-knowledge proofs at all. At a high level, the protocol takes advantage of the verifiability of the computation result to check the correctness of the server behavior. The protocol is designed so that that legitimate outputs are encoded in a small space only known to the client, and any malicious behavior by the server will result in the final output lying outside this space, which is then easily detected by the client. We prove this property (in the random oracle model) and the privacy of both the file and the DFA against an arbitrarily malicious server. We also prove the privacy of the file (except for the result of the DFA evaluation) against an honest-but-curious client.
1.3
Toward Practical Encrypted Email That Supports Private,
Regular-Expression Searches
As a practical application of private regular expression searches on encrypted data, in Chapter 5 we report a case study of a prototype implementation using our protocol to perform private regular expression searches on encrypted emails. In particular, the protocol developed there suffices to sup-port the search options (including Boolean combinations) offered by the Thunderbird email client for the text and numeric email fields, for example. Our system is thus able to support range queries on the date field and various types of substring queries on the source, destination, and subject fields of emails.
study of user email query patterns [40] showed that many queries that users create are only partial words, and so substring searching capability is important to provide an adequate user experience. Furthermore, very few searchable encryption schemes offer the capabilities of performing substring, conjunctive, disjunctive and range queries, and we are aware of none that offers all them at the same time.
Our protocol for regular-expression searching gains computational efficiency by using interac-tion, in fact requiring data transfer between the searching client and theserverholding the ciphertext of a volume larger than the searchable ciphertext itself. This obviously begs the question of whether a more suitable solution would be to download each email to the client and decrypt it there, to be searched locally. With the widespread use of volume-priced networking (i.e., over cellular data plans), however, neither design is particularly appealing. So, we instead explore a different design in which theuser(e.g., via her mobile device) submits herencryptedregular expression (or suitable representation thereof) to aproxy, which then interacts with theserver hosting the encrypted data using our protocol. After this interaction, theproxyreports information back to theuserthat permits her to determine whether there was a match, so she can retrieve the file from theserverin that case. We stress that the interaction between theuserand theproxyis independent of the lengths and num-ber of ciphertexts stored at theserver, and that theproxyis untrusted for the privacy of the search or the file contents (provided that it does not collaborate maliciously with theserver). So, for example, theproxycould be run in a cloud distinct from that where theserveris run.
Following these optimizations, we detail an implementation of our protocol and its performance when searching emails from a real-world email dataset. We show, for example, that our implemen-tation incurs average latencies of 0.89 seconds per email for performing a 9-character substring search on the sender email address field, and 0.17 seconds per email for performing a range query spanning about 6 months on the email date field. These numbers were obtained from aproxyand servereach having 8 physical cores with simultaneous multithreading enabled, yielding 16 logical cores. We also evaluate options for exploiting parallelism with our protocol, ranging from very coarse (i.e., oneserverthread and oneproxythread perserver-proxyprotocol instance, but running 16 protocols instances in parallel) to very fine (i.e., 16serverthreads and 16 proxythreads in one protocol instance).
1.4
Contributions
In summary, the contributions of this dissertation are:
• We developed protocols (in Chapter 3) that enable aclienthaving a private regular expression to evaluate on the encrypted file stored at aserver, once authorized to do so by the file owner. Our protocols contribute over prior work by offering the protection of the privacy of the file content against both server and client. More precisely, the protocols protect privacy of the query and file content against arbitrarily maliciousserveradversaries and honest-but-curious clientadversaries.
• In Chapter 4, we present a extension of the protocol developed in Chapter 3 so that, in addition to offering the security guarantees already provided by the original protocol, theclientis able to detect any misbehavior of aserver adversary. Furthermore, it can even tell whether the server input the authentic encrypted file stored there by the file owner during the protocol execution. Consequently, the input and the evaluation result are both authenticated to the client. To our knowledge, this is the first protocol published that considers secure computation on both encrypted and authenticated data in the context of DFA evaluation.
CHAPTER 2
Related Work
In this chapter, we discuss research that is related to this dissertation. We discuss general techniques for secure computation in Section 2.1 and then protocols specifically tailored to private DFA evaluation in Section 2.2. Protocols specifically targeted other types of search functionality are discussed in Section 2.3. Previous work on research to ensure that authentic inputs are employed in secure computation protocols is discussed in Section 2.4, and some previous implementation efforts for searching on encrypted data are briefly surveyed in Section 2.5.
2.1
General Techniques for Secure Computation
The problem we study in this dissertation — i.e., privately evaluating a regular expression on the plaintext of an encrypted file stored at a server — could be implemented with general techniques for “computing on encrypted data” [63] or two-party secure computation [74, 37]. These general techniques tend to yield less efficient protocols than one designed for a specific purpose, and our case will be no exception. In particular, the former achieves computations non-interactively using fully homomorphic encryption, for which existing implementations [33, 71, 66, 69] are dramatically more costly than the techniques we use.
requirement of the protocol is that the communication between the user and theproxy should be minimized. Using our protocol, the communication cost in the direction from the user to theproxy is only dependent on the size of the search query, and is independent of the number and size of the file ciphertexts. We are unaware of how to achieve this property using garbled circuits, however. Since the garbled circuit and its inputs are “unreusable” across different runs of the protocol, the user would need to provide a number of inputs (in this case, encrypted queries) to the proxy that equals to the number of files to be searched. Furthermore, the fact that in our construction, the user-generated encrypted query can be used an unlimited number of times enables a subscription service such that theproxyholds the encrypted query and periodically informs the user of the arrival of matched emails, without any further communication from the user to theproxy. Again, we are unaware of how to implement this functionality using generic garbled-circuit techniques.
An ingredient of our protocols in Chapter 3 and Chapter 5 is a two-party sharing of the data owner’s file-decryption key between a clientholding the search query and theserver holding the encrypted file. By two-party secret-sharing the file-decryption key and using this to compute on encrypted data, our protocols are related to those of Choi et al. [24]. This work developed a pro-tocol based on garbled circuits by which two parties can evaluate a general function after a private decryption key has been shared between them. This protocol can be used to solve the problem we propose, but inherits the aforementioned limitations of garbled circuits.
2.2
Specialized Protocols for DFA Evaluation
protocol of Blanton and Aliasgari [13] is relevant; they adapted the Troncoso-Pasoriza et al. protocol to an “outsourcing” model in which theclientandserversecret-share the DFA and file, respectively, between two additional hosts that interactively evaluate the DFA on the file without reconstructing either one. While our protocols utilize secret sharing, as well — in our case, of the file owner’s file-decryption key — our protocol shares much less data and does not share theclient’s DFA (or thus require two parties between which to share it) at all. Furthermore, their protocol does not support the asymmetric encryption of the file, which in the encrypted email application we consider in Chapter 5 is the predominant method for preparing a private email for its intended recipient.
2.3
Specialized Protocols for Searching on Encrypted Data
Specialized protocols for performing searches on encrypted files or database relations have also been developed. For example, searchable encryption [67, 36, 16, 22, 28, 6, 19, 9, 49, 62, 47] enables a party holding a file-decryption key to search for attribute values in the ciphertext file stored at an untrusted server. These techniques have been generalized to support more complex queries, notably conjunctive [19], disjunctive [49] and range queries [65] and inner products [49]. Searchable encryption schemes typically achieve non-interactive queries on encrypted files, in part by attaching “tag” information to the ciphertext of each file to enable the query operation. However, broadening the supported search attributes typically requires expanding the tags, and so the sizes of the tags are determined by the richness of the supported queries. In contrast, in our work the file ciphertexts are independent of the DFA(s) to be evaluated (assuming a fixed alphabetΣover which the DFAs are defined), and the computation is performed interactively between the two parties.
Richer forms of pattern-matching and search (though still not encompassing DFA evaluation) have also been studied in the two-party setting, e.g., by Jha et al. [45], Hazay and Lindell [41], Katz and Malka [48], and Hazay and Toft [42]. Again, these works input the plaintext file to one party and so do not directly apply to our setting.
usually heuristic, without formal definitions and proofs, and we are unaware of any designed to support DFA searches.
2.4
Input Authenticity in Secure Computation
Most work in the area of secure computation generally does not consider the authenticity of the inputs to the protocol. Indeed, the standard definition of security against arbitrarily malicious adversaries for general two-party protocols provides no restrictions on what input a malicious party may use in the protocol as long as he does not deviate from the protocol. The protocol we present in Chapter 4 allows theclientto tell whether theserver actually uses the authentic encrypted data of the data owner as input, in addition to the ability to detect any misbehavior by theserver. In this sense, our protocol provides an authenticated evaluation result to theclient. To our knowledge, ours is the first protocol to consider secure computation on authenticated data in the context of private DFA evaluation. The main area of specialized protocols in which input authenticity has previously been treated has been private intersection of certified sets [20, 27, 26, 68], in which the set elements of each party much be certified by a trusted third party for use in performing the intersection.
2.5
Implementations of Systems That Allow Searching on Encrypted
Data
CHAPTER 3
Third-Party Private DFA Evaluation on Encrypted
Files in the Cloud
Motivated by the need to outsource file storage to untrusted clouds while still permitting limited use of that data by third parties, in this chapter, we present practical protocols by which a client can evaluate a DFA on an encrypted file stored at a cloud server, once authorized to do so by the file owner. Our protocols provably protect the privacy of the DFA and the file contents from a malicious server and the privacy of the file contents (except for the result of the evaluation) from an honest-but-curious client (and, heuristically, from a malicious client). We introduce our main protocol in Section 3.2 and an improved protocol in Section 3.3. We further present simple techniques to detect client or server misbehavior in Section 3.4. Before that, we first define the studied problem in Section 3.1.
3.1
Problem Description
A deterministic finite automatonMis a tuplehQ,Σ,δ,qinitiwhereQis a set of|Q|=nstates;
Σis a set (alphabet) of|Σ|= msymbols;δ :Q×Σ→ Qis a transition function; andqinit is the initial state. (A DFA can also specify a function ∆ : Q → {0,1}, for which∆(q) = 1indicates thatqis an accepting state. We will discuss extensions of our protocols to this case.)
Our goal is to enable aclientholding a DFAMto interact with aserverholding the ciphertext of a file to evaluateMon the file plaintext. More specifically, the client should output the final state to which the file plaintext drives the DFA; i.e., if the plaintext file is a sequencehσkik∈[ℓ]where[ℓ]
and the number of statesnin the client’s DFA.1Theclientshould learn nothing else about the file, however, and theservershould learn nothing else about the file or theclient’s DFA.
Because the file exists in the system only in encrypted form, some private-key information must be injected into the protocol to enable a DFA to be evaluated on the file plaintext. Since (only) the data owner holds the private key, one approach would be to involve the data owner in the protocol. However, in keeping with the goals of cloud outsourcing, our protocols require the data owner only to authorize the client to perform DFA evaluations with the server — but not to participate in those evaluations herself. In our protocols, this authorization occurs by the data owner sharing the private file-decryption key between theclientand server. As a result, a client and server that collude could pool their information to decrypt the file. Here we assume no such collusion, however, for two reasons. First, we are primarily motivated by scenarios in which the clientrepresents a partially trusted service provider or customer, and so even if the cloud server were to be compromised, we presume this party would not be the cause. So, we prove security against only aclientorserveracting in isolation and with primary attention to only an honest-but-curiousclient(though we also heuristically justify the security of our protocol against an arbitrary client). Second, even without sharing the file decryption key between the clientand server, the functionality offered by our protocol (i.e., evaluating a DFA on the file) would enable a colluding clientandserverto evaluate arbitrary (and arbitrarily many) DFAs on the file, eventually permitting its decryption anyway. The only defense against collusion that we see would be to involve the data owner in the protocol; again, we do not explore this possibility here.
Another potential form of collusion that we do not explicitly consider here is collusion between the data owner and theserver, presumably to learn the DFA used by the client. In our protocol, however, the protection of DFA privacy does not depend on the security of the data owner’s file-decryption key. Since the data owner is not involved in the protocol, it does not offer theserverany additional leverage in learning theclient’s DFA.
1Since exposing the final state reduces file entropy bylog
2n bits, presumably theserver should learnn so as to
Our protocols do not retrieve the file based on the DFA evaluation results, e.g., in a way that hides from the server what file is being retrieved. However, once the client learns the final state of the DFA evaluation, it can employ various techniques to retrieve the file privately (e.g., [35]). Moreover, some of our motivating scenarios in Chapter 1, e.g., malware scans of cloud-resident files by a third party, may not require file retrieval but only that matches be reported to the file owner.
3.2
A Secure DFA Evaluation Protocol
In this section we present a protocol that meets the goals described in Section 3.1. We give the construction in Section 3.2.1, and then we define and prove security againstserver and client adversaries in Section 3.2.2 and Section 3.2.3, respectively.
3.2.1 Construction
Let “←” denote assignment and “s ←$ S” denote the assignment tosof a randomly chosen element of setS. Letκdenote a security parameter.
Encryption scheme Our scheme is built using an additively homomorphic encryption scheme with plaintext spaceRwherehR,+
R,·Ridenotes a commutative ring. Specifically, an encryption scheme E includes algorithmsGen,Enc, andDec where: Genis a randomized algorithm that on input1κ
outputs a public-key/private-key pair(pk,sk) ← Gen(1κ);Enc is a randomized algorithm that on input public keypk and plaintextm∈R(whereRcan be determined as a function ofpk) produces a ciphertextc←Encpk(m), wherec ∈Cpk andCpk is the ciphertext space determined bypk; and Dec is a deterministic algorithm that on input a private keysk and ciphertext c ∈ Cpk produces a plaintextm ← Decsk(c)wherem ∈ R. In addition,E supports an operation +pk on ciphertexts such that for any public-key/private-key pair(pk,sk),Decsk(Encpk(m1)+pkEncpk(m2)) =m1+R
m2. Using+pk, it is possible to implement·pk for whichDecsk(m2·pk Encpk(m1)) =m1·R m2.
We also requireEto support two-party decryption. Specifically, we assume there is an efficient randomized algorithmSharethat on input a private keysk outputs shares(sk1,sk2) ←Share(sk),
and that there are efficient deterministic algorithmsDec1andDec2such thatDecsk(c) =Dec2sk2(c,
An example of an encryption schemeEthat meets the above requirements is due to Paillier [59] with modifications by Damg˚ard and Jurik [29]; we henceforth refer to this scheme as “Pai”. In this scheme, the ringRisZN whereN =pp′andp,p′are primes, and the ciphertext spaceCpk isZ∗
N2.
We use Ppk to denote summation using +pk; PR to denote summation using +R; and QR to
denote the product using ·R of a sequence. For any operation op, we usetop to denote the time required to performop; e.g.,tDec is the time to perform aDecoperation.
Encodingδin a Bivariate Polynomial overR A second ingredient for our protocol is a method for
encoding a DFAhQ,Σ, δ, qiniti, and specifically the transition functionδ, as a bivariate polynomial f(x, y)over Rwherexis the variable representing a DFA state andyis the variable representing an input symbol. That is, if we treat each state q ∈ Qand eachσ ∈ Σas distinct elements ofR,
then we would likef(q, σ) = δ(q, σ). We can achieve this by choosing f to be the interpolation polynomial
f(x, y) = R
X
σ∈Σ
(fσ(x)·R Λσ(y)) where Λσ(y) = R
Y
σ′∈Σ σ′6=σ
y−R σ′
σ−Rσ′ (3.1)
is a Lagrange basis polynomial and fσ(q) = δ(q, σ) for each q ∈ Q. Note that Λσ(σ) = 1and
Λσ(σ′) = 0for anyσ′ ∈Σ\ {σ}.
Calculating Eqn. 3.1 requires taking multiplicative inverses inR. While not every element of a
ring has a multiplicative inverse in the ring, fortunately the ringZN used in Paillier encryption, for
example, has negligibly few elements with no inverses, and so there is little risk of encountering an element with no inverse. Using Eqn. 3.1, we can calculate coefficientshλσjij∈[m]so thatΛσ(y) =
R
Pm−1
j=0 λσj ·R yj. For our algorithm descriptions, we encapsulate this calculation in the procedure
hλσjiσ∈Σ,j∈[m]←Lagrange(Σ).
Eachfσ needed to computef(x, y)can again be determined as a Lagrange interpolating poly-nomial and then expressed asfσ(x) = PRni=0−1aσi ·R xi. In our pseudocode, we encapsulate this
calculation ashaσiiσ∈Σ,i∈[n]←ToPoly(Q,Σ, δ).
Protocol steps Our protocol, denoted Π1(E), is shown in Fig. 3.1. Pseudocode for the client is
and labeledm101–m106. Theclientreceives as input a public keypk under which the file (at the server) is encrypted; a sharesk1of the private keysk corresponding topk; another public keypk′;
and the DFAhQ,Σ, δ, qiniti. Theserverreceives as input the public keypk; a sharesk2of the private
keysk; the alphabet Σ; and ciphertexts ckj ← Encpk((σk)j)of thek-th file symbol σk, for each j ∈[m]and for eachk ∈ [ℓ]whereℓdenotes the file length in symbols. We assume thatsk1and
sk2were generated as(sk1,sk2)←Share(sk). Note that no information aboutsk′(the private key
corresponding topk′) is given to either party, and sopk′ ciphertexts (ρ created inc107and c115 and sent in m103 andm105, respectively) are indecipherable and ignored in the protocol. These ciphertexts are included to simplify the proof of privacy against clientadversaries (Section 3.2.3) and can be elided in practice. We do not discuss these values further in this section.
The protocol is structured as matchingfor loops executed by theclient(c105–c113) andserver (s103–s111). Theclientbegins thek-th loop iteration with an encryptionαof the current DFA state after being blinded by a random injectionπ1 :Q→ Rit chose in the(k−1)-th loop at linec109
(or, ifk= 0, then in linec103), whereInjs(Q→R)denotes the set of injections fromQtoR. The clientuses its sharesk1ofsk to create the “partial decryption”βofα(c106) and sendsα,βto the
server(m103). Theserver uses its sharesk2to complete the decryption ofα to obtain the blinded
stateγ(s104). We stress that becauseγis blinded byπ1,γreveals no information about the current
DFA state to theserver. Theserver then computes, for eachσ ∈ Σ(s105), a value Ψσ such that
Λσ(σk) =Decsk(Ψσ)(s106) by utilizing coefficientshλσjiσ∈Σ,j∈[m]output fromLagrange(s102).
Theserverthen returns (inm104) valueshµσiiσ∈Σ,i∈[n]created so thatDecsk(µσi) = γi·R Λσ(σk) (s108).
Meanwhile, the client selects a new random injection π1 ←$ Injs(Q → R) (c109). The
clientthen constructs a new DFA transition functionδ′ reflecting the injection it chose in the last round (now denotedπ0, see linec108) and the new injectionπ1it chose for this round. Specifically,
it creates a new DFA state transition function δ′ defined as δ′(q, σ) = π1(δ(π0−1(q), σ))for all
σ ∈ Σand q ∈ π0(Q)whereπ0(Q) = {π0(q)}q∈Q; we denote this step asδ′ ← Blind(δ, π0, π1)
in linec110. That is, δ′ “undoes” the previous injection π
0, applies δ, and then applies the new
injectionπ1. Theclientthen interpolates a bivariate polynomialf(x, y)such thatf(q, σ) =δ′(q, σ)
client(pk,sk1,pk′,hQ,Σ, δ, qiniti) server(pk,sk2,Σ,hckjik∈[ℓ],j∈[m])
c101. n← |Q|, m← |Σ| s101. m← |Σ|
c102. π0 ←I s102. hλσjiσ∈Σ,j∈[m]
c103. π1 ←$ Injs(Q→R) ←Lagrange(Σ)
c104. α←Encpk(π1(qinit))
m101. n ✲
m102. ✛ ℓ
c105. fork←0. . . ℓ−1 s103. fork←0. . . ℓ−1
c106. β ←Dec1sk1(α)
c107. ρ←Encpk′(π1)
m103. α,β,ρ ✲
s104. γ ←Dec2sk2(α, β)
c108. π0←π1 s105. forσ ∈Σ
c109. π1 ←$ Injs(Q→R) s106. Ψσ ← pk
m−1
X
j=0
λσj·pk ckj
c110. δ′ ←Blind(δ, π0, π1) s107. fori∈[n]
c111. haσiiσ∈Σ,i∈[n] s108. µσi←γi·pk Ψσ
←ToPoly(Q,Σ, δ′) s109. endfor
s110. endfor
m104. ✛hµσiiσ∈Σ,i∈[n] c112. α← pk
X
σ∈Σ
pk
n−1
X
i=0
aσi·pk µσi
c113. endfor s111. endfor
c114. β←Dec1sk1(α)
c115. ρ←Encpk′(π1)
m105. α,β,ρ ✲
s112. γ∗ ←Dec2
sk2(α, β)
m106. ✛ γ∗ c116. returnπ1−1(γ∗)
hµσiiσ∈Σ,i∈[n]sent from the server (messagem103) to assemble a ciphertextαof the new DFA state
under the injectionπ1(c112).
Afterℓloop iterations, theclientinteracts with theserver once more to decrypt the final state. It sends α and its partial decryption β to theserver (m105), for which the server completes the decryption (s112) and returns the result (m106).
Protocol Π1(E) can be modified to return only a binary indication of whether the DFA’s final
state is an accepting one, if the DFA specifies a function∆indicating whether a state is an accepting state. Specifically, theclientcan construct a polynomialF(x)such thatF(q) = 1if∆(q) = 1and F(q) = 0otherwise, forq ∈ Q. Then, rather than interacting with theserverto decrypt the final state, theclientcan interact with theserveronce to evaluateF(x)on the (unknown) final state and again to decrypt this result.
For brevity, Fig. 3.1 omits numerous checks that theclientandservershould perform to confirm that the values each receives are well-formed. For example, theclientshould confirm thatµσi∈Cpk for eachσ∈Σandi∈[n], upon receiving these inm104. Theserver should similarly confirm the well-formedness of the values it receives.
An Alternative Using Fully Homomorphic Encryption Our technique of encoding the DFA transition function δ using a bivariate polynomial f(x, y) over R could also be used with fully homomorphic encryption [33, 71] to create a noninteractive protocol. Theclientcould encrypt each coefficient aσi offunder the public key pk and send these ciphertexts to the server, enabling the server to perform computationsc112by itself. At the end, theserver could send a half decrypted final state back to theclient, who would complete the decryption to obtain the result. This protocol achieves communication costs ofO(nm), which is independent of the file length. That said, existing fully homomorphic schemes are far less efficient than additively homomorphic schemes, and so the resulting protocol will be less communication-efficient thanΠ1(E)for many practical file lengths
and DFA sizes.
3.2.2 Security Against Server Adversaries
file in its possession. That is, we show only theprivacy of the file and DFA inputs againstserver adversaries. In this section, we are not concerned with showing that a client can detect server misbehavior, a property often calledcorrectness. Π1(E)could be augmented using standard tools
to enforce correctness, with an impact on performance; we do not explore this here. Instead, in Section 3.4 we describe novel extensions toΠ1(E)that could be used to detectservermisbehavior.
We formalize our claims againstservercompromise by defining two separateserveradversaries. The firstserveradversaryS= (S1, S2)attacks the DFAM =hQ,Σ,δ,qinitiheld by theclient, as described in experimentExpts-dfa
Π1(E)in Fig. 3.2.S1first generates a filehσkik∈[ℓ]and two DFAsM0,
M1. (Note that we use, e.g., “M0.Q” and “M1.Q” to disambiguate their state sets.)S2then receives
the ciphertexts hckjik∈[ℓ],j∈[m]of its file, information φcreated for it byS1, and oracle access to
clientOr(pk,sk1,pk′,Mb)forbchosen randomly. ExperimentExpts-dfa
Π1(E)(S1, S2)
(pk,sk)←Gen(1κ) (sk1,sk2)←Share(sk) (pk′,sk′)←Gen(1κ)
(ℓ,hσkik∈[ℓ], M0, M1, φ)←S1(pk,sk2) if M0.Q6=M1.QorM0.Σ6=M1.Σ
then return0
b ← {$ 0,1}
m← |Mb.Σ|
fork∈[ℓ], j∈[m]
ckj ←Encpk((σk)j) b′ ←SclientOr(pk,sk1,pk′,Mb)
2 (φ,hckjik∈[ℓ],j∈[m]) if b′=b
then return1 else return0
Figure 3.2: Experiments for proving DFA privacy ofΠ1(E)againstserveradversaries
clientOr responds to queries fromS2 as follows, ignoring malformed queries. The first query
(say, consisting of simply “start”) causes clientOrto begin the protocol; clientOrresponds with a message of the form n(i.e., of the form ofm101). The second invocation byS2 must include a
single integer ℓ(i.e., of the form of m102); clientOr responds with a message of the form α, β, ρ, i.e., three values as in m103. The nextℓ−1queries byS2 must containnmelements ofCpk, i.e.,hµσiiσ∈Σ,i∈[n] as inm104, to whichclientOr responds with three values as in messagem103.
ExperimentExptindE -cpa(U) ( ˆpk,skˆ)←Gen(1κ)
ˆb $
← {0,1}
ˆb′ ←UEncˆbpkˆ(·,·)( ˆpk)
ifˆb′= ˆb
then return1 else return0
Figure 3.3: Experiment for IND-CPA security
responds with three values as inm105. The next (and last) query byS2can consist simply of a value
inR, as in messagem106.
Eventually S2 outputs a bit b′, and ExptΠs-1dfa(E)(S) = 1 only if b
′ = b. We say the
advan-tageofS isAdvsΠ-1dfa(E)(S) = 2·P
³
ExptsΠ-1dfa(E)(S) = 1´−1and defineAdvsΠ-1dfa(E)(t, ℓ, n, m) = maxSAdvsΠ-1dfa(E)(S)where the maximum is taken over all adversariesStaking timetand selecting
a file of lengthℓand DFAs containingnstates and an alphabet ofmsymbols.
We reduce DFA privacy against server attacks to the IND-CPA [10] security of the encryption scheme. IND-CPA security is defined using the experiment in Fig. 3.3, in which an adversary U is provided a public key pkˆ and access to an oracle Encˆbˆ
pk(·,·) that consistently encrypts either the first of its two inputs (ifˆb = 0) or the second of those inputs (ifˆb = 1). Eventually U out-puts a guess ˆb′ at ˆb, and ExptEind-cpa(U) = 1 only if ˆb′ = ˆb. The IND-CPA advantage of U is defined as AdvindE -cpa(U) = 2·P
³
ExptindE -cpa(U) = 1´−1. Then, AdvindE -cpa(t, w) = maxUAdvindE -cpa(U)where the maximum is taken over all adversariesU executing in timetand makingwqueries toEncˆbˆ
pk(·,·).
Our theorem statements throughout this paper omit terms that are negligible as a function of the security parameterκ.
Theorem 1. Fort′=t+tGen+tShare+ℓm·tEnc,
Advs-dfa
Π1(E)(t, ℓ, n, m) ≤2Adv
ind-cpa
E (t′, ℓ+ 1)
Proof. Let S be an adversary meeting the parameters t, ℓ, n, and m. Consider a simulation
ExperimentExptsΠ-1file(E)(S1, S2) (pk,sk)←Gen(1κ)
(sk1,sk2)←Share(sk) (pk′,sk′)←Gen(1κ)
(ℓ,hσ0kik∈[ℓ],hσ1kik∈[ℓ], M, φ)←S1(pk,sk2)
b ← {$ 0,1}
m← |M.Σ| fork∈[ℓ], j ∈[m]
ckj ←Encpk((σbk)j) b′←SclientOr(pk,sk1,pk′,M)
2 (φ,hckjik∈[ℓ],j∈[m]) ifb′ =b
then return1 else return0
Figure 3.4: Experiments for proving file privacy ofΠ1(E)againstserveradversaries
c109(i.e., ρ ← Encpk′(π),π ←$ Injs(Q → R)inc107and c115). Thenbis hidden information-theoretically from S in SimsΠ-1dfa(E), since γ is a random element of R ins104 and since γ∗ is a
random element ofR(seec109). As a result,P
³
SimsΠ-1dfa(E)(S) = 1´= 21 and forAdvsΠ-1dfa(E)(S)to be nonzero,Smust distinguishSimsΠ-1dfa(E)fromExptsΠ-1dfa(E).
We construct an IND-CPA adversaryUthat, on inputpkˆ, setspk′←pkˆ and uses its own oracle Encˆbˆ
pk to choose between running Expt s-dfa
Π1(E) andSim
s-dfa
Π1(E) forS by settingρ ← Enc
ˆb ˆ
pk(0, r)in c107andc115. (Aside from this,UperformsExpts-dfa
Π1(E)faithfully, using(pk,sk)←Gen(1
κ)and
(sk1,sk2) ← Share(sk)it generates itself.) Uthen returnsbˆ′ = 1ifS2outputsb′ =bandˆb′ = 0,
otherwise. Then,
P
³
ExptindE -cpa(U) = 1´
= 1
2P
³
ExptsΠ-1dfa(E)(S) = 1´+1 2P
³
SimsΠ-1dfa(E)(S) = 0´
= 1 2 µ 1 2+ 1 2Adv
s-dfa
Π1(E)(S) ¶ +1 4 = 1 2 + 1 4Adv
s-dfa
Π1(E)(S)
and soAdvindE -cpa(U) = 12Advs-dfa
Π1(E)(S).
Note thatUmakesℓ+ 1oracle queries and runs in timet′ =t+t
The second server adversary S = (S1, S2) attacks the file ciphertexts hckjik∈[ℓ],j∈[m] as in
experimentExptsΠ-1file(E) shown in Fig. 3.4. S1 produces two equal-length plaintext fileshσ0kik∈[ℓ], hσ1kik∈[ℓ] and a DFAM. S2 receives the ciphertexts hckjik∈[ℓ],j∈[m] for file hσbkik∈[ℓ] where b
is chosen randomly. S2 is also given oracle access toclientOr(pk, sk1, pk′, M). EventuallyS2
outputs a bitb′, andExptsΠ-1file(E)(S) = 1iffb′ =b. We say theadvantageofSisAdvsΠ-1file(E)(S) = 2·P³Expts-file
Π1(E)(S) = 1 ´
−1and thenAdvs-file
Π1(E)(t, ℓ, n, m) = maxSAdv
s-file
Π1(E)(S)where the
maximum is taken over all adversariesS= (S1, S2)taking timetand producing (fromS1) files of
ℓsymbols and a DFA ofnstates and alphabet of sizem. We prove the following theorem:
Theorem 2. Fort′=t+tGen+tShare+ℓm·tEnc,
AdvΠs-1file(Pai)(t, ℓ, n, m)≤2AdvPaiind-cpa(t′, ℓ+ 1) +AdvindPai-cpa(t′, ℓm)
Proof. LetExpts-file-0
Π1(Pai) denote experimentExpt
s-file
Π1(Pai)withbfixed atb= 0, and letExpt
s-file-1 Π1(Pai)
denote the experimentExpts-file
Π1(Pai) withbfixed at b = 1. Consider a simulationSim
s-file-0 Π1(Pai) for
Expts-file-0
Π1(Pai) that differs only by simulating clientOr so as to substitute all ciphertexts produced
with pk′ with encryptions of a random injection π independent of π1 it chose as in c109 (i.e.,
ρ ← Encpk′(π), π ←$ Injs(Q → R) inc107 and c115). Proceeding as in the proof of Theo-rem 1, we construct an IND-CPA adversary U0 that uses its own oracleEncˆbpkˆ to choose between
running ExptsΠ-1file(Pai)-0 and SimsΠ-1file(Pai-)0 forS,i.e., by settingpk′ ← pkˆ andρ ← Enc
ˆb ˆ
pk(π1, π)in c107and c115. (Aside from this,U0 performsExptΠs-1file(Pai-)0 faithfully, using(pk,sk) ←Gen(1
κ)
and(sk1,sk2) ← Share(sk)it generates itself.) U0 returnsˆb′ = 0ifb′ = bandˆb′ = 1, otherwise.
Then,
1 +AdvindPai-cpa(U0) = 2·P
³
ExptindPai-cpa(U0) = 1
´
=
P³Expts-file-0
Π1(Pai) (S) = 1 ´
+P³Sims-file-0
Π1(Pai) (S) = 0 ´
(3.2)
Now consider a simulationSimΠs-1file(Pai-)1forExpts-file-1
Π1(Pai) that again differs only by simulating
clientOrso as to substitute all ciphertexts produced withpk′with encryptions of a random injection. As above, we construct an IND-CPA adversaryU1that uses its own oracleEncˆbpkˆ to choose between
running ExptsΠ-1file(Pai)-1 and SimsΠ-1file(Pai)-1 for S, i.e., by setting pk′ ← pkˆ and ρ ← Enc
ˆb ˆ
whereπ ←$ Injs(Q → R)inc107andc115. U1 returnsˆb′ = 1ifb′ =bandˆb′ = 0, otherwise.
Then,
1 +AdvindPai-cpa(U1) = 2·P
³
ExptindPai-cpa(U1) = 1
´
=
P³Sims-file-1
Π1(Pai) (S) = 0 ´
+P³Expts-file-1
Π1(Pai) (S) = 1 ´
(3.3)
Finally, consider an adversaryUthat uses its oracleEncˆbˆ
pkto choose between runningSim s-file-0 Π1(Pai)
andSimsΠ-1file(Pai-)1 forS. Specifically, on inputpkˆ =hN, gi,U generatesd2 ←$ ZN2 and invokes
S1( ˆpk,sk2) where sk2 = hN, g, d2i. Upon receiving hσ0kik∈[ℓ] and hσ1kik∈[ℓ] from S1, U sets
ckj ← Enc
ˆb ˆ
pk((σ0k) j,(σ
1k)j). Additionally, in the simulation of clientOr, U selects r ←$ R and setsα ← Encpkˆ(r) inc104 and c112and β ← grα−d2 modN2 inc106and c114, so that
αd2β ≡grmodN2. (U also generates pk′ itself and constructs all encryptions forpk′ as
encryp-tions of a random injection.) WhenS2outputsb′,Uoutputsb′ asˆb′. Then,
1 +AdvindPai-cpa(U) =
2·P³Exptind-cpa
Pai (U) = 1
´
= 2·P³Sims-file
Π1(Pai)(S) = 1 ´
=
P
³
SimΠs-1file(Pai-)0(S) = 1´+P
³
SimsΠ-1file(Pai-)1(S) = 1´ (3.4)
Adding (3.2), (3.3) and (3.4), we get
3 +AdvindPai-cpa(U0) +AdvindPai-cpa(U) +Adv
ind-cpa
Pai (U1)
=P³Expts-file-0
Π1(Pai) (S) = 1 ´
+P³Sims-file-0
Π1(Pai) (S) = 0 ´
+P
³
SimΠs-1file(Pai-)0(S) = 1´+P
³
SimsΠ-1file(Pai-)1(S) = 1´
+P³Sims-file-1
Π1(Pai) (S) = 0 ´
+P³Expts-file-1
Π1(Pai) (S) = 1 ´
= 2·P³Expts-file
Π1(Pai)(S) = 1 ´
+ 2
= 3 +AdvsΠ-1file(Pai)(S)
The result then follows because each of U0 and U1 makes ℓ+ 1oracle queries and runs in time
ExperimentExptcΠ-1file(E)(C1, C2) (pk,sk)←Gen(1κ)
(sk1,sk2)←Share(sk) (pk′,sk′)←Gen(1κ+2)
(ℓ,hσ0kik∈[ℓ],hσ1kik∈[ℓ], M, φ)←C1(pk,sk1,pk′) if M(hσ0kik∈[ℓ])=6 M(hσ1kik∈[ℓ])then return0
b ← {$ 0,1}
m← |M.Σ| fork∈[ℓ], j ∈[m]
ckj ←Encpk((σbk)j)
b′ ←CserverOr(pk,sk2,M.Σ,hckjik∈[ℓ],j∈[m])
2 (φ)
if b′ =b
then return1 else return0
Figure 3.5: Experiment for proving file privacy ofΠ1(E)againstclientadversaries
ciphertextshckjik∈[ℓ],j∈[m].Umakesℓmoracle queries and runs in timet+tGen+tShare+ℓm·tEnc for the same reason.
3.2.3 Security Against Client Adversaries
In this section we show security of Π1(E) against honest-but-curious client adversaries and
heuristically justify its security against malicious ones. Since theclienthas the DFA in its posses-sion, privacy of the DFA against aclientadversary is not a concern. The proof of security against theclienttherefore is concerned with the privacy of only the file. However, by the nature of what the protocol computes for theclient— i.e., the final state of a DFA match on the file — theclient can easily distinguish two files of its choosing simply by running the protocol correctly using a DFA that distinguishes between the two files it chose.
For this reason, we adapt the notion of indistinguishability to apply only to files that produce the same final state for the client’s DFA. So, in the experiment Exptc-file
Π1(E) (Fig. 3.5) that we
use to define file security against clientadversaries, the adversary C = (C1, C2) succeeds (i.e., Exptc-file
Π1(E)(C)returns1) only if the two fileshσ0kik∈[ℓ]andhσ1kik∈[ℓ]output byC1both drive the
DFAM, also output byC1, to the same final state (denotedM(hσ0kik∈[ℓ]) =M(hσ1kik∈[ℓ])).
This caveat aside, the experiment is straightforward: C1 receives public key pk, private-key
hσ0kik∈[ℓ]and hσ1kik∈[ℓ]and a DFA M. Depending on how bis then chosen, one of these files is
encrypted using pk and then provided to theserver, to which C2 is given oracle access (denoted
serverOr(pk,sk2, M.Σ,hckjik∈[ℓ],j∈[m])).
Adversary C2 can invoke serverOr first with a message containing an integer n(i.e., with a
message of the formm101), to whichserverOrreturnsℓ(m102). C2 can then invoke serverOrup
toℓ+ 1times. The firstℓsuch invocations take the formα,β,ρand correspond to messages of the formm103. Each such invocation elicits a responsehµσiiσ∈Σ,i∈[n](i.e., of the formm104). The last
clientinvocation is of the formα,β,ρand corresponds tom105. This invocation elicits a response γ∗ (i.e.,m106). Malformed or extra queries are rejected byserverOr.
We show file privacy against honest-but-curious clientadversaries C = (C1, C2), i.e.,C2
in-vokesserverOrexactly asΠ1(E)prescribes, using DFAM output byC1. We define the advantage
ofCto behbcAdvcΠ-1file(E)(C) = 2·P
³
ExptcΠ-1file(E)(C) = 1´−1andhbcAdvcΠ-1file(E)(t, ℓ, n, m) = maxCAdvcΠ-1file(E)(C) where the maximum is taken over honest-but-curious client adversaries C
running in total timetand producing files of lengthℓand a DFA ofnstates over an alphabet ofm symbols. We prove:
Theorem 3. Fort′=t+tGen+ℓm·tEnc+ (ℓ+ 1)·tDec,
hbcAdvcΠ-1file(Pai)(t, ℓ, n, m)≤AdvindPai-cpa(t′, ℓm(1 +n))
Proof. Given an adversaryC = (C1, C2)running in timetand selecting files of lengthℓsymbols
and a DFA ofnstates over an alphabet ofmsymbols, we construct an IND-CPA adversaryU that demonstrates the theorem as follows. On inputpkˆ =hN, gi,U generates(pk′,sk′)← Gen(1κ+2)
and d1 ←$ ZN2, and invokes C1( ˆpk,sk1,pk′)where sk1 = hN, g, d1i to obtain(ℓ,hσ0kik∈[ℓ],
hσ1kik∈[ℓ],M,φ), whereM =hQ,Σ,qinit,δiis a DFA. Note thatd1 is chosen from a distribution
that is statistically indistinguishable from that from which d1 is chosen in the real system. For
k∈[ℓ]andj∈[m],Usetsckj ←Encpkˆbˆ((σ0k)j,(σ1k)j).
U then invokesC2(φ) and simulates responses toC2’s queries to serverOr as follows
γ1 ←π(q1), and then setsµσi←Enc
ˆ
b
ˆ
pk(((γ0) i·
RΛσ(σ0k),((γ1)
i·
RΛσ(σ1k))forσ∈Σandi∈[n].
After this, Uupdatesq0 ←δ(q0, σ0k)andq1 ←δ(q1, σ1k), and returnshµσi,iσ∈Σ,k∈[n]toC2. For
the last queryα, β, ρ, adversaryUcomputesπ ←Decsk′(ρ)and returnsγ∗ =π(q0) (=π(q1))to C2. WhenC2outputsb′,Uoutputsb′, as well.
This simulation is statistically indistinguishable from the real system provided thatCis honest-but-curious, and so ignoring terms that are negligible inκ,hbcAdvcΠ-1file(Pai)(C) =AdvindPai-cpa(U). Note thatUruns int′ =t+t
Gen+ℓm·tEnc+ (ℓ+ 1)·tDec due to the need to generate(pk′,sk′) and sk1, to create the file ciphertexts hckjik∈[ℓ],j∈[m], and to perform ℓ+ 1 Paidecryption in the
simulation.Umakesnmoracle queries in order to respond to each of theℓoracle queries following the first, plus an additionalℓmqueries to createhckjik∈[ℓ],j∈[m].
We have found extending this result to fully maliciousclientadversaries to be difficult for two reasons. First,ExptcΠ-1file(E) does not make sense for a maliciousclient, sinceC2is not bound to use
the DFAMoutput byC1. As such,C2can use a different DFA — in particular, one that enables it
to distinguish between the files output byC1. Second, even ignoring the final stateγ∗sent back to
theclient, we have been unable to reduce the ability of theclientadversary to distinguish between two files on the basis of m104 messages to breaking the IND-CPA security of E; intuitively, the difficulty derives from the simulator’s inability to decryptαvalues provided byC2. (The ciphertext
ρ enables the simulator to “track” the plaintext of α in the honest-but-curious case, but ρ might
contain useless information in the malicious case.)
3.3
An Alternative Protocol
The second protocol we present has the same goals asΠ1(E) but incurs less communication
costs. Specifically, whereas the communication cost of Π1(E) isO(κℓnm) bits, the protocol we
present in this section, called Π2(E), sends only O(κℓ(n+m)) bits. Π2(E) accomplishes this
in part by exploiting a cryptosystem that is additively homomorphic and that offers the ability to homomorphically “multiply” ciphertexts once. That is, the cryptosystem supports a new operator
⊙pk that satisfiesDecsk(Encpk(m1)⊙pkEncpk(m2)) =m1·Rm2, but the result of a⊙pk operation (or any other ciphertext resulting from+pk or·pk operations in which it is used) cannot be used in a
⊙pk operation. After we present our protocol, we will discuss various options for instantiating this
encryption scheme within it.
Protocol Π2(E) is shown in Fig. 3.6. Note that the input arguments to both theclientand the
server are identical to those inΠ1(E). The structure of the protocol is also very similar toΠ1(E),
with the only differences being in how the server performs each loop iteration (s204–s212) and how the client forms the new encrypted DFA stateα(c212–c216). We now summarize the primary innovations represented by these differences.
After thek-thm203message, theserverconstructs an encryptionΨσofΛσ(σk)(s206). Rather than computingµσi ← γi·pk Ψσ, however, theserversendshΨσiσ∈Σ to theclientinm204. Each
µσiis then built at theclient, instead (c212–c214), which is the main reason we get better commu-nication efficiency.
Since eachµσiis built at theclient, theserver must sendγ inm204. To hide the current DFA state from theclient, theserver blindsγ with a randomr ∈R(s208–s209) before returning it. So,
theclientneeds to accommodate r without knowing it when performing the DFA state transition. Theclientcannot perform the polynomial evaluation using thef(x, y)it constructed (c211) on the
hµσiiσ∈Σ,i∈[n]as inΠ1(E)sincef(x, y)is designed for an inputq ∈π0(Q), notq+r. To overcome
this, the client constructs a shifted polynomial f′(x, y) such that f′(q +r, σ) = f(q, σ) for all q ∈π0(Q), and sof′(x, y)will correctly translate the blinded input to the next DFA state. What is
client(pk,sk1,pk′,hQ,Σ, δ, qiniti) server(pk,sk2,Σ,hckjik∈[ℓ],j∈[m])
c201. n← |Q|, m← |Σ| s201. m← |Σ|
c202. π0 ←I s202. hλσjiσ∈Σ,j∈[m]
c203. π1 ←$ Injs(Q→R) ←Lagrange(Σ)
c204. α←Encpk(π1(qinit))
m201. n ✲
m202. ✛ ℓ
c205. fork←0. . . ℓ−1 s203. fork←0. . . ℓ−1
c206. β ←Dec1sk1(α)
c207. ρ←Encpk′(π1)
m203. α,β,ρ ✲
s204. γ ←Dec2sk2(α, β)
c208. π0←π1 s205. forσ∈Σ
c209. π1 ←$ Injs(Q→R) s206. Ψσ ← pk
m−1
X
j=0
λσj·pk ckj
c210. δ′ ←Blind(δ, π0, π1) s207. endfor
c211. haσiiσ∈Σ,i∈[n] s208. r $
← R
←ToPoly(Q,Σ, δ′) s209. γ ←γ+ R r
s210. fori∈[n]
s211. νi ←Encpk(ri) s212. endfor
m204. ✛γ,hΨσiσ∈Σ,hνiii∈[n] c212. forσ ∈Σ, i∈[n]
c213. µσi←γi·pk Ψσ c214. endfor
c215. hˆa′
σiiσ∈Σ,i∈[n]
←Shift(hνiii∈[n],haσiiσ∈Σ,i∈[n])
c216. α← pk X
σ∈Σ
pk
n−1
X
i=0 ˆ
a′σi⊙pk µσi
c217. endfor s213. endfor
c218. β ←Dec1sk1(α)
c219. ρ←Encpk′(π1)
m205. α,β,ρ ✲
s214. γ∗ ←Dec2
sk2(α, β)
m206. ✛ γ∗ c220. returnπ−11 (γ∗)
If we setf′(x, y) = PRσ∈Σ(fσ′(x)·R Λσ(y))wherefσ′(x) = PRin=0−1a′σi·R xi, then it suffices
iffσ′(x+Rr) =fσ(x)for allσ∈Σ. Note that
fσ(x−R r) = R
n−1
X
i=0
aσi·R(x−Rr)i = R
n−1
X
i=0
aσi·R R
i
X
i′=0
µ
i i′
¶
·R xi−i′ ·R (−Rr)i′ (3.5)
= R n−1 X i=0 Ã R
n−1−i
X
i′=0
aσ(i+i′)·R
µ
i+i′
i′
¶
·R(−Rr)i′
!
·Rxi
where Eqn. 3.5 follows from the binomial theorem. Therefore, setting
a′σi ← R
n−1−i
X
i′=0
aσ(i+i′)·R
µ
i+i′ i′
¶
·R (−R1)i′·Rri′ (3.6)
ensuresfσ′(x+Rr) =fσ(x)and sof′(x+R r, σ) =f(x, σ). The clientknows all the terms in Eqn. 3.6 except ri′
. That is exactly the reason the server sends in m204 the ciphertext νi of ri, for each i ∈ [n](see s211). The clientcan then calculate a ciphertext ˆa′
σi of the coefficient ofxi in fσ′ by using the additive homomorphic property of the encryption scheme :
ˆ
a′σi← pk
n−1−i
X
i′=0
µ
aσ(i+i′)·R
µ
i+i′
i′
¶
·R (−R1)i′
¶
·pk νi′ (3.7)
In our pseudocode, the calculations Eqn. 3.7 are encapsulated within the operation hˆa′σiiσ∈Σ,i∈[n]
←Shift(hνiii∈[n],haσiiσ∈Σ,i∈[n])on linec215.
After theclientobtainshaˆ′σiiσ∈Σ,i∈[n]andhµσiiσ∈Σ,i∈[n], it performs polynomial evaluation at
stepc216to assemble the ciphertext of the next DFA state by taking advantage of the one multipli-cation homomorphism of the cryptosystem. This is where the additional homomorphism helps to achieve much better communication complexity.
The privacy of the file and DFA fromserver adversaries and the privacy of the file fromclient adversaries can be proved for Π2(E) very similarly to how they are proved for Π1(E). In fact,
Theorems 1–3 hold for Π2(E)unchanged, once instantiated with a suitable encryption scheme E.
Instantiating E Protocol Π2(E) requires an additively homomorphic encryption scheme E that
also supports the “one time” homomorphic multiplication operator ⊙pk. Perhaps the most well-known such cryptosystem is due to Boneh, Goh and Nissim [17], and moreover, this cryptosystem also supports two-party decryption with a cost comparable to regular decryption [17]. The primary difficulty in instantiating E with this cryptosystem, however, is that decryption — and specifically inΠ2(E), the operationDec2sk2 — requires computing a discrete logarithm in a large group, which
is generally intractable. That said, if the ciphertext is known to encode one of a small number of possible plaintexts, thenDec2sk2 can be adapted to test the ciphertext for each of these plaintexts efficiently. As such, to adaptΠ2(E)to employ this cryptosystem, we can augment messagesm203
andm205withπ1(Q)(listed in random order), for the injection π1at the time the message is sent.
This would permit the server to perform Dec2sk2(α, β) in lines s204, s214by testing for these n possible plaintexts. It does, however, have the unfortunate side effect of enabling our proofs for the analogs of Theorems 1 and 2 forΠ2(E)to go through only for honest-but-curiousserveradversaries. Π2(E)instantiated in this way still appears to be secure even against maliciousserver adversaries,
though at this point we can claim this only heuristically.
Two other possibilities for instantiating E in Π2(E) are due to Gentry, Halevi and
Vaikun-tanathan [34]2 and Lauter, Naehrig, and Vaikuntanathan [51]. The primary challenge posed by these cryptosystems is that two-party decryption algorithms for them have not been investigated. Each of these schemes is amenable to sharing its private key securely, after which decryption can be performed using generic two-party computation [74, 7]. These instantiations retain Π2(E)’s
provable security against malicious serveradversaries (i.e., the analogs of Theorems 1 and 2), but
Π2(E)instantiated this way may be less cost-efficient thanΠ1(Pai)for many values ofnandm.3
Of course, customized two-party decryption algorithms for these cryptosystems could restore the efficiency ofΠ2(E), suggesting a useful open problem for the community.
2
Because we require the plaintext ring to be commutative, we would restrict the plaintext space of the Gentry et al. cryptosystem to diagonal square matrices, versus the arbitrary square matrices over which it is defined.
3For example, for the Gentry et al. scheme, a “garbled” arithmetic circuit [7] for secure two-party decryption using