Protocol Analysis - Secure Protocols for Privacy-preserving Data Outsourcing, Integration, and

5.4.1 Complexity Analysis

Proposition 6 Complexity. The average runtime complexity of our proposed protocol is bounded by O(logfn×d2) operations, whered is the number of records andn is the number of columns in

EncDS.

Proof.The generation ofEncDSfrom Sub-Protocol 1.2requiresO(n×d). The shuﬄing ofEncDS also requiresO(n×d) [SHKS12]. As a result, Protocol 5.1 takesO(n×d) operations to generate MixDS. The shuﬄing of n items in Univ and the construction of TaxTree with fan-out f requires O(n) operations. In Sub-Protocol 2.1, 2n _{distinct partitions might be created in the worst-case}

scenario. However, since it is impossible in practice to have|MixDS|=d= 2n _{records to ﬁll out all}

the partitions, we argue that the average-case complexity reﬂects a more accurate measure of our protocol’s performance. Given that the mean of Laplace distribution is 0, the noise in the average case is cancelled out, while assuming that the records are assign evenly between all partitions at the same level in PartTree (worst case). The number of levels in PartTree is (log2d/f), and the total

number of partitions in all levels islog2d/f

i=0 2i×f =O(d), whereO(logfn) operations are applied

on each partition. Since all records in MixDS are validated against each partition inPartTree for assignment, then the required number of operations isO(logfn×d2).

Discussion.The analysis shows that our approach is suitable for high-dimensional data since the complexity is logarithmic with regard to the number of dimensions. On the other hand, it is quadratic with respect to the number of records. This is due to the security protections of our protocol, where no information can be inferred by malicious adversaries either during data integration or during the partitioning process. Lowering record complexity while maintaining the same level of security is a non-trivial open problem.

5.4.2 Security Analysis

Proposition 7 Integrity. The overall protocol is sound under the malicious adversarial model.

Proof.All steps in our solution are publicly veriﬁable, which prevents a compromised data owner from deviating from the correct computation without detection. If detected, honest data owners will not proceed, preventing the completion of the protocol (as the decryption operations throughout the protocol, including the last step, require all participants). Table9illustrates the publicly veriﬁable primitive of each security-sensitive step in each proposed protocol and sub-protocol. We inherit integrity against a dishonest majority from our building blocks (and we can provide robustness against dishonest minority by adjusting the threshold of the decryption operation).

We must also ensure that all inputs to the protocol are correctly formed. In the setup phase, where the data owners interact together to construct the public key, the distributed key generation (DKG)

Table 9: The publicly veriﬁable primitives involved in each security-sensitive step of the proposed protocol P. V. Primitive Construction 1 Protocol 1 Sub-Protocol 1.1 Sub-Protocol 1.2 Protocol 2 Sub-Protocol 2.1 Sub-Protocol 2.2 Mix Network 1,4,5†

Mix and Match 1−4 1.b

NIZKP 1 Homomorphic Operation 2 2,3 2 Public Encryption 1.a‡ Distributed Decryption 4 Cut-and-Choose 1 Distributed Proxy Re-encryption 2.a.i Cleartext Operation∗ 2,3,6 3 1₋4 ₂1,2.a.ii, .b,3 5−7

† _{The shuﬄing in Step 2 and 5 is performed at the same time to ensure that the same random permutation}_π_{is used.} ‡_{All participants agree on one randomness value for encryption so that anyone can verify the ciphertexts by regenerating} them.

∗Cleartext operations involve steps that do not require a secret, such as sub-protocol calls and broadcasting an output.

protocol ensures that the output is uniformly distributed at random [GJKR07]. In the case of data encryption, each ciphertext must be from Gq ×Gq such that the data owners are able to check

the independency of the ciphertexts. When operations of mixing (shuﬄing & re-randomization), distributed proxy re-encryption, and plaintext equality test are performed, each data owner inputs a random exponent fromZ∗q for blinding. As long as there is at least one exponent that is uniformly

distributed at random, the addition of all exponents is also random. Finally, during the generation of a uniformly random bitstring using the Coin Toss protocol, the same property holds: as long as there is at least one honest data owner, then the result is uniformly random.

Proposition 8 Privacy-preserving. The overall protocol is privacy-preserving.

Proof. To prove that our protocol is privacy-preserving, we show that the data is protected throughout the protocol execution.

Input Data. Each data ownerPiencrypts his data, proves knowledge of it, and then inputs it to

the protocol. The proof iszero-knowledge, wherePi proves that he knows the underlying plaintexts

of the encrypted data without revealing any information about the plaintexts.

Encrypted Data. While encrypted, the data is protected under the CPA-security of the encryption scheme (e.g., DDH for ElGamal) and the proof is zero-knowledge. The adversary cannot decrypt items arbitrarily, as the decryption key is (n, n)-shared between all data owners, requiring

the adversary to corrupt every data owner to be successful (in which case, all the inputs are already known). Moreover, applying veriﬁable mixing on the columns and rows of the encrypted data removes any correspondence between ciphertexts and the original items/records.

Decrypted Data. The underlying data remains encrypted throughout the protocol except in two areas: within Mix and Match (during plaintext equality tests) and within proxy re-encryption. However, both subprotocols are veriﬁable and already provide protection against a malicious adversary.

5.4.3 Correctness and Utility Analysis

Proposition 9 Correctness. Given p≥2 set-valued datasets with record and item overlaps, the proposed protocol generates ε-diﬀerentially private set-valued data.

Proof.We ﬁrst show that our protocol can handle record and item overlaps, and then show that the released data isε-diﬀerentially private.

Data Overlap.If data about the same individual exists in more than one dataset (record overlap), then Step 2.a of Sub-Protocol 1.2is applied to generateoneintegrated record for that individual. Functionn-ORis used to set the total number of occurrences of an item tooneif the item exists in more than one record for the same individual (item overlap).

ε-Diﬀerentially Private Data Generation.Sub-Protocol 2.1performs the same sequence of partitioning operations as the algorithm in [CMF+₁₁_{], except that our protocol is in a distributed}

setting. Since [CMF+₁₁_{] generates}_{ε-diﬀerentially private set-valued data, we prove the correctness}

of Sub-Protocol 2.1by only proving the correctness of the diﬀerent steps:

• Record-Partition Assignment. Verifying that every item in DifNodes exists in R, and R has already been assigned to Parent(Part), is equivalent to verifying that every item in HCut exists in R. Moreover, verifying that every item inPart.CCutdoes not exist inRensures that the same record is not assigned to more than one sibling partition.

• Noise Generation. In Sub-Protocol 2.2, even thoughLN noiseis diﬀerentially private, the partial noiseXkfromPk is not; hence the use of Cut-and-Choose protocol to allow for encrypted

partial noises while ensuring they are random variables satisfying gamma distribution. More- over, since the total noiseLN noiseof a leaf partition is equal toLapNoise(ε/2), the output is guaranteed to be ε/2-diﬀerentially private.

In document Secure Protocols for Privacy-preserving Data Outsourcing, Integration, and Auditing (Page 115-119)