To accommodate for the existence of malicious users, we introduced a second server that is semi-honest. Users are excluded from all protocol operations (except for setting them in motion by requesting a recommendation) and are only required to provide input during the registration phase. Users then
3
provide DNA-material to the main server (the secret shares to the indicator collection: (Xu,i) and [(Yu,i)]U, the encrypted indicator collection [Iu,i]U
and the encrypted genome sequence [Gu]U). Users also provide to the main
server a re-encryption key for the proxy server and their rating vector with the accompanying indicator vector, both of which are split into secret shares. Even though all user data which is stored by the main server is stored either encrypted or as secret shares, where one share is encrypted, quite a lot of user data is stored at the server and so quite a lot of trust is placed in the server. When both servers follow protocol, neither server will ever know any of the user’s confidential information, since neither server has access to the unencrypted information or to both secret shares that form a piece of data. However, if the servers were to collude, all user data would be discovered: the genome sequence, the rating vector and the accompanying indicator vector.
This is no different from the security setting in which users were required to be semi-honest: then, if the user and the server were to collude, they could also retrieve the user’s friends’ dna-material and rating vectors.
However, this means that the security of the malicious user model (also) builds upon a non-collusion assumption, which is a limit to the security. The security setting where two servers are required to be semi-honest and non- colluding is stronger than the security setting in which users are also semi- honest, but it would be even better to have a recommender system in which every entity could be malicious, but could still not recover any information about any of the other entities. To achieve this security requirement, we would need to have protocols in which all operations on the users’ data could be carried out without having to share the data between two servers. This is a current limitation of the recommender system and it remains future work to study whether this security requirement can be achieved efficiently.
7
Experimental Results
To analyze the performance of our protocols, we implemented prototypes in C++. We used several libraries as building blocks for our system, especially for the somewhat homomorphic encryption and for the use of elliptic curves in the proxy re-encryption scheme.
The implementation of the system consists of prototypes for: the edit dis- tance protocol, the substitution cost protocol, the minimum finding protocol, the smith-waterman distance protocol and the offline recommender protocol. All of these protocols have been implemented for both the semi-honest user model as well as the malicious user model, except for the minimum-finding protocol, of which only the malicious version was implemented. The reason that the minimum-finding protocol in the semi-honest model was not imple- mented is that there were some implementationwise technical requirements that made an actual implementation impractical. When using a group of integers to represent the messages, there will be overflows of integers that wrap around the group. However, the protocol that our version of minimum- finding in the semi-honest model was based on, uses comparisons that will give incorrect results when wraparound happens, which will happen often. A very large message space would be needed to solve this problem, but this would lead to a very impractical implementation.
We also implemented the rating update protocol and the proxy re- encryption scheme by Ateniese et al. [AFGH06], which is used in the imple- mentation of the recommender protocols.
All prototypes have been implemented so that for each party in the protocol, steps are taken sequentially on the same machine. All protocols have been implemented in a single thread and have been tested on a virtual machine with a 2x Intel Xeon CPU at 2.33 GHz with 8GB of RAM.
7.1 Random data for tests
For testing the similarity computation protocols, we generated random DNA- sequences represented with the integers 1,2,3,4 of different lengths. For testing the recommender protocols, we altered the existing datasets that were used by Jeckman’s et al. [JPH13] slightly to fit our input requirements, in order to run our data on the same ratings that were used for the evalua- tion of the performance of their recommender system. In order to test the rating updates protocol, we generated artificial user rating vectors where the ratings lie between 1 and 5. This rating range was chosen out of con- venience, but does have some impact on the efficiency of the protocol, since
the two servers compare an updated rating value to all possible rating values at the beginning of the protocol to see whether the supplied and encrypted rating value is legitimate (and does not lie outside the permitted rating value boundaries). The impact on efficiency will not be very great however, because this process is linear in the amount of permitted rating values. This range of allowed rating values is also a good fit for most recommender sys- tems, it would be a good representation of a rating system where users give stars to items for example.