Depending on the sensitivity of the actual data to be outsourced, the data owner might be willing to accept the induced information leakage or even increase the leakage in order to trade security for better performance or additional functionality for encrypted data. Further, the practical implications heavily depend on the type and entropy of data to be outsourced.
The constructions for exact pattern matching presented in Chapter 5, for secure joins presented in Chapter 6 and secure range queries presented in Chapter 7 achieve novel trade offs between security and performance for fundamental functionalities that have already been supported by property-preserving encryption. Since we claim increased security for these constructions compared to property-preserving encryption we prove it in the simulation-based proof framework. In contrast, the construction for privacy-preserving substring search presented in Chapter 8 is based on the functionality of secure range queries. Assuming a natural text of a spoken language (e.g. English) to be outsourced, the entropy of such data is well known, therefore we examine the practical security consequences for this construction using property-preserving encryption for additional search performance. That is, we have applied our construction and try to extract as much information as possible applying the best known attacks on property-preserving encryption. We refer to Chapter 8 for a thorough discussion and a comprehensive description of the actual attack.
4.2 Performance Assessment Methodology
The second dimension we evaluate our protocols is performance, hence we give a brief description of the methodology for performance assessment in this section. While we focus on the running time of a protocol this methodology could be extended to other key performance indicators such as memory consumption or required network bandwidth. We assume running time of the protocol as the most important parameter for acceptance of the proposed solution by the end user. Therefore we measure the time period between creation of the search token until receiving the corresponding query result as the overall running time of one protocol run corresponding to the time the end users submits the query until the result retrieval. Further, we investigate this overall running time in more details and give micro benchmarks for critical operations, e.g. query token generation, in order to identify bottle necks or distinguish between processing time on the client’s device and the server environment. Specific details and particular considerations depend on each construction individually and are discussed in the corresponding chapter.
4.2.1 Theoretical Runtime Analyses
In order to get an “awareness” about the magnitude of runtime performance, we analyze the costs of an algorithm using the big O notation. Big O notation gives an upper bound for the asymptotic performance of an algorithm, i.e. when the input reaches or exceeds a specific value and allows to classify algorithms with respect to their runtime as already done in Section 2.1 for PPT algorithms. Formally, big O notation is stated as follows [47]:
Definition 12(Big O Notation). Letf :N→Randg:N→R. We sayf(x)is inO(g(x))if there exists
a constantc >0and a thresholdn0 ∈Nsuch thatf(n)≤c·g(n)for alln∈Nwithn > n0. We write
f(x) =O(g(x))for a functionf(x)inO(g(x)).
The big O notation is transitive by definition: given three functionsf(x),g(x)andh(x)withf(x)is in O(g(x))andg(x)is inO(h(x))it holds thatf(x)is inO(h(x)). Further, the most important classes for algorithms are enumerated in the following in ascending order with faster growing runtime: constant runtime denoted asO(1), logarithmic runtime denoted asO(logn), linear runtime denoted asO(n), polynomial
4 Methodology
runtime denoted asO(nc)for a constant1c∈Nand exponential runtime denoted asO(cn)for a constant c∈ N. Any search index for fast query processing should decrease search complexity in way such that it is smaller than linear in the database table size. Note that these classes express the worst case runtime for an algorithm. While no input for the algorithm results in a runtime larger than this bound, there might be a great amount of inputs that result in strictly lower runtime.
In this thesis we design systems that are queried repeatedly, therefore it is natural to ask for the runtime of
asequence of queries. One could simply multiply the worst case runtime expressed in big O notation with
the number of queries. Given a protocol with only a low number of possible inputs with such worst case runtime, however, this assessment would result in a too pessimistic (i.e. too high) runtime estimation, thus we follow the approach of amortized analysis as introduced by Tarjan [141]. While amortized analysis does not state explicit performance numbers for one specific query it gives a better understanding of the average performance properties. We refer to the textbook by Cormen et al. for additional details [47].
4.2.2 Practical Runtime Evaluation
Big O notation and amortized analysis are vital tools for estimation of a program’s runtime as function of the input size. Nevertheless, practical evaluation on real input data is still vital for testing the practical feasibility of an approach and allows comprehensive comparison of different approaches for specific use cases. Especially the impact of large constants are omitted in big O notation and neither are considered in amortized runtime analysis as discussed in the previous section. As a consequence we have implemented each construction in this thesis and evaluated it in a real-world environment, that is, we model a real database, encrypt it using the proposed construction and ask for realistic privacy-preserving query sequences. This practical evaluation is performed on a testbed corresponding “state-of-the-art” environment and is specified in full details in each chapter.
1
Withc= 1we get linear runtime.
5
Exact Keyword Matching
In this chapter we present protocols for exact keyword queries on encrypted data. Such functionality enables the database client to delegate exact keyword filtering to the untrusted database server. The DBMS on this server is then able to filter for all encrypted values that match the keyword specified by the client. In the following Section 5.1 we give an introduction, discuss the problem in more details and give a general interface that offers the functionality for exact pattern matching on encrypted data with the option to add and delete data entries securely. An overview of related work addressing this type of query is presented in Section 5.2. In Section 5.3 we describe the abstract idea of our solution from a high-level perspective and present a concrete construction. Based on this concrete construction a detailed evaluation is given in Section 5.4 regarding both formal security and performance. Performance evaluation is done in two ways, that is, theoretical runtime numbers are given together with practical benchmark results. A variation of this construction is given in Section 5.5 trading communication overhead for additional client storage. Finally, Section 5.6 provides a summary of this chapter. The content of this chapter has been published in joint work with Florian Kerschbaum at CCS 2014:
• Hahn, Florian ; Kerschbaum, Florian: Searchable Encryption with Secure and Efficient Updates.
In:Proceedings of the Conference on Computer and Communications Security, 2014 (CCS)