2.5 Searchable Encryption
2.5.1 Searchable Symmetric Encryption
Consider the one writer and one reader setting described above, where a single client uploads a document collection (i.e., a data set) D to a server and then is able to query on it. The document collection is composed of tuples (Wi, Di), where the document Di
is any kind of plaintext and Wi = (wi,1, . . . , wi,q) is a list of keywords associated to Di.
A searchable symmetric encryption (SSE) scheme S consists of five polynomial-time algorithms, S = {S.Gen, S.Enc, S.Trapdoor, S.Search, S.Dec}. The probabilistic algo- rithm S.Gen is first used to provide a client with a symmetric key k. Then, the client is able to generate a searchable index I and an encryption C of the data set D by executing S.Enc on input the symmetric key k and the data set D.
Afterwards, the client may want to retrieve the documents satisfying a particular pred- icate f specified by a Boolean formula on a tuple of keywords. An example of such a predicate is w1∧ (w2∨ w3), which is satisfied by documents whose associated keyword
list contains the keyword w1 and either w2 or w3. The client generates a trapdoor T by
executing the algorithm S.Trapdoor on f , and sends T to the server. The server can then retrieve the encrypted documents satisfying f by combining the trapdoor T and the searchable index I and by using the public algorithm S.Search.
We present below a general model for searchable symmetric encryption schemes. Note that dynamic [110] or verifiable [111] searchable encryption schemes may include addi- tional algorithms.
Definition 2.18. A searchable symmetric encryption scheme S is comprised of five polynomial-time algorithms:
S.Gen(λ):
Probabilistic algorithm run by the client that, given a security parameter λ, re- turns a secret key k and the public parameters params of the scheme, including a symmetric-key encryption scheme (Gen, Enc, Dec).
S.Enck(D):
Probabilistic algorithm run by the client and taking as input a tuple of plaintexts D = (Wi, Di)Ni=1, where Wi is a tuple of keywords attached to the document Di.
It returns an encrypted index I together with a tuple of encrypted documents C = (Ci)Ni=1= (Enck(Di))Ni=1 for the algorithm Enc in params.
S.Trapdoork(f ):
Algorithm run by the client that takes a predicate on keywords f as input and returns a trapdoor T associated to f .
S.Search(I, C, T ):
Deterministic algorithm run by the server and taking as input an encrypted index I, a tuple of ciphertexts C and a trapdoor T . It returns a tuple of elements of C. S.Deck(C):
Deterministic algorithm run by the client and taking an encrypted document C as input. It returns the output of Deck(C) for the algorithm Dec in params.
We say that S is correct if for all security parameters λ, for all k output by S.Gen, for all document collections D and for all (I, C) output by S.Enc, it holds with all but negligible probability that
D(f ) = {S.Deck(C) : C ∈ S.Search(I, C, S.Trapdoork(f ))},
where D(f ) are all documents in D whose attached keyword tuple satisfies f .
Searchable symmetric encryption schemes can be classified according to query expres- siveness, i.e., to the class of predicates allowed by the S.Trapdoor algorithm. If the algo- rithm S.Trapdoor supports only single keywords as predicates, we say that S is a single- keyword SSE scheme. Alternatively, if S.Trapdoor supports arbitrary conjunctions of such predicates, we say that S is a conjunctive SSE scheme. If S.Trapdoor supports arbitrary Boolean formulas, we say that S is a Boolean SSE scheme. The searchable encryption literature deals with several other predicates, such as range [65, 93], substring [93] or subset [65] queries. We further comment on these in Chapter 3.
The first provably secure symmetric-key searchable encryption scheme was proposed in 2000 by Song, Wagner and Perrig [25]. In 2006, after a series of advances [112, 113], in the foundational work [37], Curtmola, Garay, Kamara and Ostrovsky introduced several rigorous and strong security definitions for searchable encryption, along with a scheme satisfying them. Their definitions have become standard in the searchable encryption literature, and their scheme is the basis for many other efficient SSE schemes.
Before describing the strongest security definition out of the two presented in [37], we define some concepts which have been informally introduced in the previous subsection. Definition 2.19 (History). Let D = (Wi, Di)Ni=1 be a document collection. A q-query
history over D is a tuple H = (D, w) that includes the document collection D and a list of q keywords w = (w1, . . . , wq).
The concept of q-query history formalizes the information outsourced, in the form of documents and single-keyword queries, to the cloud service provider in a cloud computing setting. As explained above, the access and search patterns respectively refer to the information of which documents have been accessed when processing a query and to the information of whether any two single-keyword trapdoors encode the same keyword or not.
Definition 2.20 (Access Pattern). The access pattern induced by a q-query history H = (D, w) is the tuple α(H) = (D(w1), . . . , D(wq)), where D(wi) denotes all documents in
D matched by keyword wi.
Definition 2.21 (Search Pattern). The search pattern induced by a q-query history H = (D, w) is the q × q symmetric matrix σ(H) where the (i, j)-th component σ(H)i,j
The notion used in the security definitions in [37] is that of the trace of a history. The trace consists of the information about the history that searchable encryption schemes do typically leak.
Definition 2.22 (Trace). The trace induced by a q-query history H = (D, w) is the se- quence τ (H) = (|D1|, . . . , |DN|, α(H), σ(H)) comprised of the lengths of the documents
in D and the access and search patterns induced by H. If w is empty, we denote τ (D, w) = τ (D).
Next, we describe the Adaptive Semantic Security definition presented in [37]. In this definition, Curtmola et al. consider an adaptive stateful adversary, that is, an adversary A that chooses how to interact with the server by taking into account all the previously held information, including an internal state variable stA.
Definition 2.23 (Adaptive Semantic Security for SSE [37]). Let S = {S.Gen, S.Enc, S.Trapdoor, S.Search, S.Dec} be a searchable symmetric encryption scheme. Let λ be the security parameter and A = (A0, . . . , Aq) and B = (B0, . . . , Bq) be tuples of PPT
algorithms respectively defining the adversary and the simulator, where q is polynomial in λ. Consider the following probabilistic experiments RealS,A(λ) and SimS,A(λ):
RealS,A(λ) : k ← S.Gen(λ) (D, stA) ← A0(λ) (I, C) ← S.Enck(D) (w1, stA) ← A1(stA, I, C) T1← S.Trapdoork(w1) for 2 ≤ i ≤ q, (wi, stA) ← Ai(stA, I, C, T1, . . . , Ti−1) Ti ← S.Trapdoork(wi) let T = (T1, . . . , Tq)
output v = (I, C, T) and stA
SimS,A,B(λ) : (D, stA) ← A0(λ) (I, C, stB) ← B0(τ (D)) (w1, stA) ← A1(stA, I, C) (T1, stB) ← B1(stB, τ (D, (w1))) for 2 ≤ i ≤ q, (wi, stA) ← Ai(stA, I, C, T1, . . . , Ti−1) (Ti, stB) ← Bi(stB, τ (D, (w1, . . . , wi))) let T = (T1, . . . , Tq)
output v = (I, C, T) and stA
We say that S is adaptively semantically secure if for all PPT adversaries A there exists a PPT simulator B such that for every PPT distinguisher algorithm D,
|Pr (D(v, stA) = 1 : (v, stA) ← RealS,A(λ))
−Pr (D(v, stA) = 1 : (v, stA) ← SimS,A,B(λ))|
is negligible, where the probabilities are taken over the random coins of S.Gen, S.Enc, A and B.
The adaptive semantic security definition for SSE states that the only information po- tentially leaked by the scheme is the trace induced by the outsourced data set and by the queried keywords. However, the leakage of searchable encryption schemes can vary depending on the scheme and on the followed adversary model. More complex schemes may need a different description of leakage. To address this problem, in [114] Chase and Kamara introduced the notion of leakage function. Leakage functions generalize the
notion of trace: given a history as input, leakage functions output an upper bound on the information that the scheme leaks about the history.
We next introduce the L-Semantic Security against Adaptive Attacks definition pre- sented by Cash et al. in [93], which is used in this thesis. This security definition is conditioned to a predefined leakage function, and it is designed for the OXT Boolean SSE scheme defined in [93]. This security definition is conceptually equivalent to adap- tive semantic security for L = τ . We modify the original formulation to fit the notation of Definitions 2.18,2.23.
Definition 2.24 (L-Semantic Security against Adaptive Attacks [93]). Let S = {S.Gen, S.Enc, S.Trapdoor, S.Search, S.Dec} be a searchable symmetric encryption scheme. Let λ be the security parameter and A = (A0, . . . , Aq+1) and B = (B0, . . . , Bq) be
tuples of PPT algorithms respectively defining the adversary and the simulator, where q is polynomial in λ. Consider the following probabilistic experiments RealS,A(λ) and
SimS,A(λ): RealS,A(λ) : k ← S.Gen(λ) (D, stA) ← A0(λ) (I, C) ← S.Enck(D) (w1, stA) ← A1(stA, I, C) T1 ← S.Trapdoork(w1) for 2 ≤ i ≤ q, (wi, stA) ← Ai(stA, I, C, T1, . . . , Ti−1) Ti ← S.Trapdoork(wi) let T = (T1, . . . , Tq) b ← Aq+1(stA, I, C, T) output b SimS,A,B(λ) : (D, stA) ← A0(λ) (I, C, stB) ← B0(L(D)) (w1, stA) ← A1(stA, I, C) (T1, stB) ← B1(stB, L(D, (w1))) for 2 ≤ i ≤ q, (wi, stA) ← Ai(stA, I, C, T1, . . . , Ti−1) (Ti, stB) ← Bi(stB, L(D, (w1, . . . , wi))) b ← Aq+1(stA, I, C, T) output b
We say that S is L-semantically secure against adaptive attacks if for all PPT adversaries A there exists a PPT simulator B such that
|Pr (RealS,A(λ) = 1) − Pr (SimS,A,B(λ) = 1)|
is negligible, where the probabilities are taken over the random coins of S.Gen, S.Enc, A and B.
To conclude this subsection, we describe the second and most secure searchable en- cryption scheme introduced by Curtmola, Garay, Kamara and Ostrovsky in [37]. This single-keyword static scheme, while being less efficient than the first scheme in [37], is the first scheme to satisfy the strong notion of adaptive semantic security.
The main insight behind the schemes presented in [37] is the use of an inverted index, which is the conceptual basis for nearly all efficient searchable symmetric encryption solutions. The use of inverted indexes in [37] initiated a recurrent trend in searchable encryption, where cryptographic primitives are used on top of structured data to enable efficient searching over encrypted data.
A forward index can be understood as a table, where every row contains a different document D and all keywords w1, . . . , wn attached to it. An inverse index can also be
understood as a table, where every row contains a different keyword w and all documents D(w) = {D1, . . . , Dm} it is attached to. The task of searching for all documents attached
to a particular keyword w over a forward index is sequential, and takes linear time O(|D|) in the number of documents. In contrast, this same searching task takes O(|D(w)|) time when using inverted indexes, by employing a technique called FKS dictionary [115]. Before stating the definition, we follow [37] and give some notation. Given an ordered set A = (a1, . . . , an), we denote by A[i] the i-th element ai of A. Given an element b ∈ A,
assuming A has no repeated elements, we denote by addrA(b) the subindex i such that
A[i] = b. Therefore, A[addrA(b)] = b.
Definition 2.25. Let n be the total number of documents to encrypt and let MAX be the bit size of the largest document. Let W denote the keyword space, and as- sume that the bit size of the keywords in the document collection to encrypt ranges between m and `, and that it includes a keyword of bit size exactly m. Set max = min max{i :Pi j=12mjj < MAX}, |W | and s = max ·n.
Define a SSE scheme S by the five following polynomial-time algorithms: S.Gen(λ):
Choose parameters k, ` non-constant polynomial in λ. Let W be the keyword space. Let T = (T .Gen, T .Enc, T .Dec) be an IND − PCPA-secure symmetric encryption scheme. Instantiate a PRP π with key space {0, 1}k
π : {0, 1}k× {0, 1}`+log2(n+max) → {0, 1}`+log2(n+max)
Sample K1 uniformly at random from {0, 1}k and generate K2 = T .Gen(λ). Out-
put the public parameters params = {W, k, `, m, max, n, s, T , π} and the secret key K = (K1, K2).
S.EncK(D):
Let D = (Wi, Di)ni=1, where Wi is a tuple of keywords attached to the document
Di. For every document D in D, denote by id(D) a unique identifier of D. Define
δ(D) = ∪iWi as the ordered set of distinct keywords in D. For every w ∈ δ(D),
let D(w) denote the ordered set of documents attached to keyword w, in the order induced by D.
Compute C = (C1, . . . , Cn), where Ci = T .EncK2(Di).
To prepare the index I, for every w ∈ δ(D) and for every D ∈ D(w), set I[πK1(w||addrD(w)(D))] = id(D).
Now let s0 = P
w∈δ(D)|D(w)|, and let ci denote the number of entries of I that
contain id(Di). For security purposes, if s0 < s then modify I as follows: for all
documents Di in D and for all 1 ≤ l ≤ max −ci, set I[πK1(0
`kn + l)] = id(D i).
Output I, C. S.TrapdoorK(w):
S.Search(I, C, T ):
Let X be an empty set. For all t ∈ T such that I[t] exists, add C[I[t]] to X. Output X.
S.DecK(C):
Output D = T .DecK2(C).
This searchable encryption scheme is seen to be adaptively semantically secure in [37], under the assumptions that π is a pseudo-random permutation and that T is IND − PCPA-secure.