Invertible Polynomial Representation for Private Set Operations

(1)

for Private Set Operations

Jung Hee Cheon, Hyunsook Hong, and Hyung Tae Lee

CHRI & Department of Mathematical Sciences, Seoul National University 1 Gwanak-ro, Gwanak-gu, Seoul 151-747, Korea

{jhcheon,hongsuk07,htsm1138}@snu.ac.kr

Abstract. In many private set operations, a set is represented by a polynomial over a ringZσ for a composite integerσ, where Zσ is the message space of some additive homomorphic encryption. While it is useful for implementing set operations with polynomial additions and multiplications, a polynomial representation has a limitation due to the hardness of polynomial factorization over

Zσ. That is, it is hard to recover a corresponding set from a resulting polynomial overZσ if σis not a prime.

In this paper, we propose a new representation of a set by a polynomial overZσ, in whichσ is a composite integer withknown factorizationbut a corresponding set can be efficiently recovered from a polynomial except negligible probability in the security parameter. Note thatZσ[x] is not a

unique factorization domain, so a polynomial may be written as a product of linear factors in several ways. To exclude irrelevant linear factors, we introduce a special encoding function which supports early abort strategy. As a result, our representation can be efficiently inverted by computing all the linear factors of a polynomial inZσ[x] whose roots locate in the image of the encoding function.

When we consider group decryption as in most private set operation protocols, inverting polynomial representations should be done without a single party possessing the secret of the utilized additive homomorphic encryption. This is very hard for Paillier’s encryption whose message space isZNwith unknown factorization ofN. Instead, we detour this problem by using Naccache-Stern encryption with message spaceZσ whereσis a smooth integer with public factorization.

As an application of our representation, we obtain a constant round privacy-preserving set union protocol. Our construction improves the complexity than the previous without an honest majority. It can be also used for a constant round multi-set union protocol and a private set intersection protocol even when decryptors do not possess a superset of the resulting set.

Keywords:Polynomial representation, Polynomial factorization, Root finding, Privacy-preserving set union

1 Introduction

Privacy-preserving set operations (PPSO) are to compute set operations of participants’ dataset without revealing any information other than the result. There have been many proposals to construct PPSO protocols with various techniques such as general MPC [13, 1], polynomial representations [9, 17, 10, 24, 14], pseudorandom functions [15], and blind RSA signatures [7, 6]. While the last two techniques are hard to be generalized into multi-party protocols, polynomial representations combining with additive homomorphic encryption (AHE) schemes enable us to have multi-party PPSO protocols for various operations including set intersection [17, 10, 24], (over-)threshold set union [17], element reduction [17] and so on. Among these constructions, set intersection protocols run in constant rounds, but others run in linear of the number of participants.

Let us focus on privacy-preserving set union protocols. There are two obstacles to construct constant round privacy-preserving multi-party set union protocols based on polynomial repre-sentations with AHE schemes. First, in the polynomial reprerepre-sentations set union corresponds to polynomial multiplication, which is not supported by an AHE scheme in constant rounds. Second, to recover the union set from the resulting polynomial, we need a root finding algorithm of a polynomial over Zσ, where Zσ is the message space of the AHE scheme.

(2)

In their protocol, each participant takes part in the protocol with a rational function whose poles consist of the elements of his set and at the end of the protocol he obtains a rational function whose poles correspond to the set union. Then each participant recovers the denominator of the rational function using the extended Euclidean algorithm and finds the roots of the denominator. Since each rational function is summed up to the resulting function after encrypted under an AHE scheme, the first obstacle is easily overcome.

However, a root finding is still problematic on the message space Zσ of the AHE schemes. Since the message space has unknown order [22] or is not a unique factorization domain (UFD) [21, 23, 3] in the currentefficientAHE schemes, there is no proper polynomial factorization or root finding algorithm working on the message space. To avoid this obstacle, the authors in [25] utilized a secret sharing scheme. However, it requires computational and communicational costs heavier than the previous and requires an honest majority for security since their protocol ex-ploits a secret sharing scheme to support privacy-preserving multiplications in constant rounds.

Our Contribution Let σ =Q`_i¯₌₁qi for distinct primes qi, which is larger than the size of the universe of set elements. We propose a new representation of a set by a polynomial over Zσ, in which a corresponding set can be efficiently recovered from a polynomial except negligible probability when the factorization ofσ is given.

For a given polynomial f(x) =Qd_i₌₁(x−si)∈ Zσ[x], if the factorization of σ is given, one can obtain all roots of f in Zqi for each i by exploiting a polynomial factorization algorithm

over a finite field Zqi [31]. By reassembling the roots of f inZσ using the Chinese Remainder

Theorem (CRT), we can obtain all the candidates. However, the number of candidates amounts tod¯`_{, which is exponential in the size of the universe.}

We introduce a special encoding function ιto exclude irrelevant candidates efficiently. For a polynomialf =Qd_i₌₁(x−ι(si))∈Zσ[x], our encoding function aborts most irrelevant candidates withoutd`¯_{CRT computations, by giving a certain relation among roots of}_f _in

Zqj[x] and roots

off inZqj+1[x]. As a result, our encoding function enables us to efficiently recover all the roots of f with negligible failure probability if they are in the image ofι.

Table 1.Comparison with Previous Set-Union Protocols

HBC Rounds Communication Cost Computational Cost # of Honest Party [17] O(n) O(n3kτN) O(n4k2τNρN) ≥1 [10] O(n) O(n2kτN) O(n2k2τNρN) ≥1 [25] O(1) O(n4k2τp0) O(n5k2ρ_p0) ≥n/2

Ours O(1) O(n3kτN) O(n3k2τNρN) ≥1 Malicious Rounds Communication Cost Computational Cost # of Honest Party

[10] O(n) O((n2k2+n3k)τN) O(n2k2τNρN) ≥1 [25] O(1) O(n4k2τp) O(n5k2τpρp) ≥n/2 Ours O(1) O(n3k2τN) O(n3k2τNρN) ≥1

n: the number of participants,k: the maximum size of sets

τN, τp0, τp: the size of modulusN for Paillier encryption scheme or NS encryption scheme,

the sizep0 of representing domain, the orderp of a cyclic group for Pedersen commitment scheme, respectively

ρN, ρp0, ρp: modular multiplication cost of modulusN for Paillier encryption scheme or NS

encryption scheme,p0 for the size of representing domain,pfor the order of a cyclic group for Pedersen commitment scheme, respectively

As an application of our new representation, combining with Naccache-Stern (NS) AHE scheme which is the factorization of σ is public, we obtain an efficient constant round privacy-preserving set union protocol without an honest majority. In Table 11_{, we compare our set union}

protocols with the previous main results [10, 17, 25]. 1

(3)

Remark that we can easily extend our set union protocol to a multi-set union protocol by encoding the same elements differently. We describe details in Section 4.3. The resulting multi-set union protocol is little bit slower than the previous result [14], but the public key size of the utilized encryption in our protocol is O(1), while that of the previous is O(d) for the sized of the multi-set union.

We also consider transforming previous privacy-preserving set intersection protocol in [17] into a protocol even when decryptors do not possess a superset of the resulting set except the universe.

Related Work There have been a few researches to construct a privacy-preserving multi-party set and multi-set union protocol. Kissner and Song [17] provided multi-party set and multi-set union protocols in the honest-but-curious case. Frikken [10] and Sang and Shen [24] presented more efficient multi-party set union protocols and multi-set union protocols in the malicious case, respectively. However, these protocols exploit a mix-net protocol [11] instead of a root finding algorithm, which runs in linear rounds in the number of corrupted players, and hence it cannot run in constant rounds. Blanton and Aguiar [2] presented efficient privacy-preserving set and multi-set union protocols based on a secret sharing technique and these can be parallelized, but still run in linear rounds.

Recently, Seo et al. [25] proposed a constant round multi-party set union protocol by rep-resenting elements in a set as poles of a rational function. However, their constructions hire a secret sharing technique for supporting privacy-preserving multiplications in constant rounds, thus requires an honest majority assumption for security and computational and communica-tional complexity heavier than the previous.

In case of privacy-preserving multi-set union protocols, Hong et al. [14] proposed a protocol based on ElGamal encryption schemes defined over an extension field Fqκ where κ is larger

than dfor the cardinality dof the resulting multi-set. However, it suffers from the public key size of the utilized encryption since the extension degreeκof the extension field has to be larger thandand hence the public key size is O(d).

Outline of the Paper In Section 2 we look into some components of our privacy-preserving set union protocol, polynomial factorization algorithm, polynomial representation, and AHE schemes. We provide our new polynomial representation that enables us to uniquely factorize a polynomial satisfying some criteria in Section 3. Our constant round privacy-preserving set union protocols are presented in Section 4. Some supplying materials including analysis of our representation and the set union protocol for the malicious case, are given in Appendix.

2 Preliminaries

In this section, we look into polynomial representations of a set for PPSO protocols and in-troduce efficient AHE schemes utilized in PPSO protocols to support polynomial operations between encrypted polynomials. Then, we briefly look into root finding algorithms over finite fields and message spaces of AHE schemes for applying to recover a set from a polynomial in the polynomial representation.

2.1 Basic Definitions and Notations

Throughout the paper, let U be the universe, nbe the number of participants in the protocol, andkbe the maximum size of participants’ datasetsSi’s. Also,ddenotes the size of (multi-)set union among participants’ datasets in the protocol.

LetR[x] be a set of polynomials defined over a ringRandR(x) be a set of rational functions defined over R, i.e., R[x] = {f(x)|f(x) = P_ideg₌₀ff[i]xi _and _f_[_i_] _∈ _R _{for all} _i_} _and _R₍_x_{) =}

{fg((xx))|f(x), g(x)∈R[x], g(x)6= 0}. For a polynomial f ∈R[x], we denote the coefficient of x

(4)

a polynomialf byf[i],i.e.,f(x) =Pdeg_i₌₀ff[i]xi_∈_R_[_x_{]. For a polynomial}_f₍_x_{) =}Pdegf

i=0 f[i]xi ∈

Zσ[x] and a factor q of σ,f modq denotes a polynomialP_ideg₌₀f(f[i] modq)xi ∈Zq[x].

We also define a negligible function as follows: a function g:N→Risnegligible if for every positive polynomialµ(λ), there exists an integerN such thatg(λ)<1/µ(λ) for all λ > N.

2.2 Polynomial Representation of a Set

LetR be a commutative ring with unity andS be a subset ofR. We may represent a setS by a polynomial or a rational function overR.

Polynomial Representation In some previous works [9, 17, 10, 14, 25], a setScan be represented by a polynomial fS(x)∈R[x] whose roots are the elements ofS. That is,

fS(x) :=

Y

si∈S

(x₋si).

This representation gives the following relation:

fS(x) +fS0(x) = gcd(f_S(x), f_S0(x))·u(x)

for some polynomial u(x) _∈ R[x] and hence the roots of a polynomial fS(x) +fS0(x) are the elements ofS_∩S0 _{with overwhelming probability. Also, the roots of}_f

S(x)·fS0(x) are the elements of S∪S0 as multi-sets.

Rational function Representation Recently, Seo et al. [25] introduced a novel representation of a set S⊂R by a rational functionFS overRwhose poles consist of the elements ofS. That is,

FS(x) :=

1

Q

si∈S(x−si)

= 1

fS(x)

.

This representation provides the following relation:

FS(x) +FS0(x) = fS(x) +fS 0(x)

fS(x)·fS0(x) =

gcd(fS(x), fS0(x))·u(x)

fS(x)·fS0(x) =

u(x) lcm(fS(x), fS0(x)) for some polynomialu(x)_∈R[x] which is relatively prime to lcm(fS(x), fS0(x)) with overwhelm-ing probability. Hence the poles of FS(x) +FS0(x) are exactly the roots of lcm(f_S(x), f_S0(x)), which are the elements ofS_∪S0 as sets, not multi-sets, if u(x) and lcm(fS(x), fS0(x)) have no common roots. This rational function is represented again by an infinite formal power series, so called aReversed Laurent Series (RLS), in [25].

2.3 Additive Homomorphic Encryption

Let us consider a commutative ringRwith unity and aR-moduleGwherer·g:=gr _for_r_∈_R and g _∈ G. Let Encpk : R → G be a public key encryption under the public key pk. We can define a public key encryption for a polynomialf =Pdeg_i₌₀ff[i]xi _∈_R_[_x_{] as follows:}

Epk(f) :=

deg_Xf

i=0

Encpk(f[i])xi.

Assume Encpk has an additive homomorphic property such that

Encpk(a+b) =Encpk(a)Encpk(b), Encpk(ab) =Encpk(a)b

for a, b _∈ R. Then given two polynomials f = Pdeg_i₌₀ff[i]xi _and _g ₌ Pdegg

i=0 g[i]xi in R[x], we

(5)

– Polynomial addition: Given _Epk(f) and Epk(g), it is possible to compute Epk(f +g) by calculatingEncpk((f+g)[i]) =Encpk(f[i])Encpk(g[i]) for all 0≤i≤max{degf,degg}where

f +g=P_imax₌₀{degf,degg}(f+g)[i]xi_.

– Polynomial multiplication: Given Epk(f) and g, it is possible to compute Epk(f g) by calculating Encpk((f g)[`]) =

Q

i+j=`Encpk(f[i])g[j] for all 0≤`≤degf+ degg where f g=

Pdegf+degg

`=0 (f g)[`]x`.

There have been several efficient AHE schemes [21–23, 3]: Under the assumption that fac-toring N = p2_q _{is hard, Okamoto and Uchiyama [22] proposed a scheme with} _R ₌

Zp and

G=ZN, in which the orderp of the message spaceR is hidden. With the decisional composite residuosity assumption, Paillier [23] scheme and Camenisch and Shoup scheme [3] haveR=ZN and G = ZN2 for N = pq, in which the size of message spaces is a hard-to-factor composite integer N. Naccache and Stern [21] proposed a scheme with R = Zσ and G =ZN under the higher residuosity assumption, where N =pq is a hard-to-factor integer and σ is a product of small primes dividing φ(N) for Euler’s totient functionφ.

In the above schemes, it is hard to find the roots of a polynomial inR[x] without knowing a secret key. For the second case, in fact, Shamir [27] showed that to find a root of a polynomial

f(x) = Qd_i₌₁(x₋si) ∈ ZN[x] is equivalent to factor N. While, in the NS scheme, it may be possible to compute some roots of a polynomial inZσ[x] since the factorization ofσ is public. ButZσ[x] is not a UFD and hence the number of roots of a polynomial f ∈Zσ[x] can be larger than degf. In fact, if f(x) =Qd_i₌₁(x₋si) ∈R[x], then the number of candidates of roots of the polynomial f isd`¯_{where ¯}_` _{is the number of prime factors of}_σ_{. We will use the NS scheme} by presenting a method to efficiently recover all the roots of a polynomial f _∈Zσ[x] satisfying some criteria.

2.4 Root Finding Algorithms

When R is a finite field Fq, we have several efficient root finding algorithms in R[x]. As an instance, using a square-free decomposition and the Cantor-Zassenhaus algorithm [32, Sec-tion 14.4], a polynomial of degree d over a field Fq is factored in Oe(d2logq) field opera-tions. Recently, it has been improved using fast arithmetic into O(d1.5+o(1)_{) field operations}

by Umans [31]. However, as mentioned above, there is no efficient AHE scheme whose message space is a finite fieldFq with public q.

ConsiderR=Zσ, a message space of the NS encryption scheme forσ=Q

¯

`

j=1qjwith distinct public primes qj’s. Let f ∈Zσ[x] be a polynomial of degree d, which is a product of d linear factors (x₋si)’s. SinceZσ[x] is not a UFD, it is still very hard to recover the exact (x−si)’s from the polynomialf. However, since the factors ofσ are known, one can apply Umans’ polynomial factorization algorithm to find all roots off modqj withOe(d1.5) field operations inZqj for each j. Then one performs the CRT computations, which takes O(log2σ) bit operations for each candidate of roots. Since each polynomialf modqj has aboutddistinct roots, there exist about

d¯`_{candidates of roots of}_f_over

Zσ. Hence the total complexity becomesOe(¯`d1.5log2q+d

¯

`_log2_σ₎

bit operations, where q := maxj{qj}. However, one still can not determine the exact si’s since it has no criterion to distinguish the exactsi’s from candidates.

In this paper, we can recover all the exact roots off satisfying some criteria inO(¯`d1.5_log2_q₎

bit operations using early abort technique. The details are in Section 3.

3 Invertible Polynomial Representation

LetfS(x) =Qsi∈S(x−si)∈Zσ[x] be a polynomial for a setS⊆Zσand a compositeσ =

Q¯_`

(6)

the polynomial fS in almost all cases. In this section, we provide our new polynomial repre-sentation that enables us to efficiently recover the exact corresponding set from the polynomial represented by our suggestion.

Before describing our polynomial representation, we assume that the resulting set union of cardinality dis randomly chosen in U(d), the set of subsets of cardinality d of the universe U.

In general, since elements of a participant’s dataset may not be random in the universe_U, the resulting set union may also not be random in the set _U(_d₎. However, one can efficiently make that the resulting set union satisfies this assumption using a pseudorandom permutation. Also, in practice, one can utilize a block cipher such as the DES encryption instead of a pseudorandom permutation [12, Section 3.7]. For example, a participant encrypts his element s0 _{so that} _s ₌

DESK3(DESK2(DESK1(s0))) where K1, K2 and K3 are public and randomly chosen in the key

space of the DES encryption. Then a participant obtains the original message s0 by computing

s0₌_DES−1

K1(DES

−1

K2(DES

−1

K3(s))) after recovering a rootsfrom the resulting polynomial.

Parameter Setting Let us explain parameters for our polynomial representation and PPSO protocols. First, set the bit size of the modulus N of the NS encryption scheme by considering a security parameter λ. For the given universe _U and the maximum size of the resulting set uniond(here,d=nk for the numbernof participants and the maximum sizekof participants’ datasets), letd0 = max{d,dlogNe}and setτ = 1₃(logd+ 2 logd0). This setting comes from the

computational complexity analysis of our set union protocol and the value τ will influence the bit size of prime factors of σ and the size of the message space of the NS encryption scheme. See Section 4.2 for details.

Set the parameter `andα so that`is the smallest positive integer such that_{U ⊆ {}0,1_}3τ α` for some rational number 0 < α <1 satisfying 3ατ and 3(1−α)τ are integers. Note that the proper size of α is 1₃ for optimal complexity of our protocol. The details will be presented in Appendix A. Then, set the proper size ¯` larger than ` and let `0 _{= ¯}_`₋_`_{. The analysis of the}

proper size of ¯`will be discussed at the end of Section 3.1. Choose ¯`(3τ+ 1)-bit distinct primes

qj’s and set σ=Q

¯

`

j=1qj. Note that the size of the message space of the NS encryption scheme

is less than N

4 for its security [21]. Hence, the parameters have to be satisfied the condition σ < N

4 and so ¯` <

blogNc−2

3τ . Also, we assume that ¯`is smaller thandfor optimal complexity of our proposed protocol. In summary, the parameter ¯`is smaller than min_{d,blog₃N_τc−2_}.

Now, we are ready to describe our new polynomial representation. Focus on the fact that the factorization ofσis public in the NS encryption scheme. Using this fact, given a polynomial

f =Qd_i₌₁(x₋si) ∈Zσ[x] for a set S ={s1, . . . , sd}, one can obtain all roots of f modqj for eachjby applying Umans’ polynomial factorization algorithm over a finite field Zqj. To recover S, one can perform CRT computation for obtaining less than d`¯_{candidates of roots of} _f _over Zσ. In general, however, the number of roots of f overZσ is larger than degf and there is no criteria to determine the exact set S. To remove irrelevant roots which are not in S, we give some relations among all roots of polynomialsf modqj’s by providing an encoding function.

Before describing our solution, we look into an easy way to give a relation among all roots of polynomial f modqj for allj. However, it is not efficient to recover a set from a polynomial.

Encoding with a tag To give relations among all roots of polynomialsf modqi’s, we may consider to insert the same value depending on the element, called atag, into an element part inZqj’s.

For example, let h:{0,1}∗ _{→ {}₀_,₁_}3(1−α)τ _{be an uniform hash function for a positive rational} number 0 < α < 1 such that 3ατ and 3(1₋α)τ are to be integers. For a dataset S _⊆ Zσ, parse si ∈ S into ¯` blocks si,1, . . . , si,`¯ of (3ατ)-bit so that si = si,1|| · · · ||si,`¯. Consider a function ι0 : U ⊆ {0,1}3ατ¯` _→

(7)

ι0₍_s

i) ≡ si,j||h(si) modqj for 1 ≤ j ≤ `¯.2 Then we represent a set S as a polynomial fS(x) =

Q

si∈S(x−ι

0₍_s

i))∈Zσ[x].

Then one can reduce the number of candidates by checking a tag h(si) when one gets all roots over Zqj’s. However, when a collision occurs, say h(si) =h(sj) with si 6= sj, one has to

check the hash value of 2`¯_{elements which are possible combinations of (}_s

i,1, sj,1),· · ·,(si,¯`, sj,`¯). The probability3 _{that at least one collision occur among} _d_{hash values} _h₍_s

i)’s is lower bounded

by d2

21+3τ(1−α) which is not negligible even for small α. Moreover, the expected computation becomesΩ(2`¯_).

3.1 Our Polynomial Representation

Now, we present our polynomial representation for supporting to recover a set from a polynomial overZσ represented by our suggestion. Takeα = 1₃,i.e.,U ⊆ {0,1}τ ` for optimization. Ifα6= 1₃, the expected computation is in polynomial time only when the size of the universe is restricted. Details about the proper size ofα is in Appendix A.

· · ·

. ..

(modq1) (mod q2)

(modq_`¯)

..

.

..

.

..

.

Fig. 1.Our Encoding Functionι

Encoding by Repetition Let h :{0,1}∗ _{→ {}₀_,₁_}2τ _and _h

j :{0,1}∗ → {0,1}τ be uniform hash functions for 1_≤j_≤`0_{. Parse a message}_s

i ∈ U ⊆ {0,1}τ ` into`blocks si,1, . . . , si,` ofτ-bit so that si =si,1|| · · · ||si,`. Let si,`+j =hj(si) for 1≤j ≤`0 and parseh(si) into two blockssi,`¯+1

and s_i,_`¯₊₂ of τ-bit. Set ¯` = `+`0. We define our encoding function ι :U ⊆ {0,1}τ ` → Zσ, in whichι(si) is the unique element inZσ satisfying ι(si)≡si,j||si,j+1||si,j+2modqj for 1≤j≤`¯. Then a set S is represented as a polynomial fS(x) =Qsi∈S(x−ι(si))∈Zσ[x].

Decoding Phase Denote by s(_ji) := ι(si) modqj for each message si = si,1|| · · · ||si,`. For 1 ≤

j _≤`¯₋1, we define (s(_ji), s_j(i₊₁0))_∈Zqj×Zqj+1 to be a linkable pair if the last (2τ)-bit ofs

(i)

j is equal to the first (2τ)-bit of s(_ji₊₁0),i.e.,si,j+1||si,j+2=si0_,j+1||s_i0_,j₊₂. Inductively, we also define (s(₁i1),· · · , s(ij+1)

j+1 )∈Zq1 × · · · ×Zqj+1 to be a linkable pair if (s

(i1) 1 ,· · · , s

(ij)

j ) and (s

(ij)

j , s

(ij+1) j+1 )

are linkable pairs.

Let ι(si) and ι(si0) be images of elements s_i and s_i0 of the functionι withs_i 6=s_i0. We can easily check the following properties:

–

s(₁i),· · · , s(_ji₊₁) is always a linkable pair.

2_{By notation abuse, throughout this paper if necessary, we regard a bit string as the corresponding integer via}

some converting function.

3_{Pr[At least one collision occur among}_d_{hash values}_h₍_s_i)’s.] _≤ ₁₋₁₋ 1 23(1−α)τ

· · ·1− d−1 23(1−α)τ

≈ 1− exp−Qd−1

i=1

i

23(1−α)τ

≈ d2

(8)

s

(i1)

1

=

s

(i2)

2

=

s

(i3)

3

=

s

_i₁_,_1||

s

_i₁_,_2||

s

_i₁_,₃

s

i3,3||

s

i3,4||

s

i3,5

)

⇣

s

(i1)

1

, s

(i2)

2

, s

(i3)

3

⌘

is a linkable pair.

s

i2,2||

s

i2,3||

s

i2,4

Fig. 2.Linkable Pair

– When si and si0 are uniformly chosen strings from {0,1}τ `,

Pr[(s(_ji), s_j(i0)) is a linkable pair] = Prsi,j+1||si,j+2 =si0_,j+1||s_i0_,j₊₂= 1

22τ (1)

for a fixed 1_≤j_≤`¯.

At decoding phase, when a polynomialf(x) =Qd_i₌₁(x−ι(si))∈Zσ[x] is given, we perform two phases to find the correct droots of the polynomial f(x). In the first stage, one computes all the roots _{s(1)_j ,_{· · ·}, s(_jd)_} over Zqj[x] for each j. For eachj sequentially from 1 to ¯`−1, we

find all the linkable pairs among _{s(1)_j ,_{· · ·}, s(_jd)_}and _{s(1)_j₊₁,_{· · ·} , s(_jd_+1}) by checking whether the last (2τ)-bit ofs(_ji) and the first (2τ)-bit ofs(_ji₊₁0) are the same. It can be done byd2_comparisons

orO(dlogd) computations using sorting and determining algorithms.

After ¯`−1 steps, we obtaind0 number of linkable pairs of ¯`-tuple, which are candidates of roots of the polynomial f and elements of the set. It includes the delements corresponding to

ι(s1), . . . , ι(sd). If d0 is much larger than d, it can be a burden. However, we can show that the expected value ofd0 is at most 3din Theorem 1. See Section 3.2.

After obtainingd0 linkable pairs of ¯`-tuple, in the second phase, we check whether each pair belongs to the image of ιwith the following equalities:

si,`+j =hj(si) for all 1≤j≤`0, (2)

s_i,_`¯+1||s_i,_`¯₊₂ =h(si). (3)

The linkable pairs of ¯`-tuple, corresponding toι(si) for some iclearly satisfies the above equa-tions. However, for a random ¯`-tuple in Zq1 × · · · ×Zq¯_`, the probability that it satisfies the relation (2) is about 1

2τ `0 and the probability that it satisfies the relation (3) is about 1

22τ under

the assumption that h and hj’s are uniform hash functions. Hence, the expected number of wrong ¯`-tuples passing both phases is less than d× 1

2τ(2+`0). It is less than 2

−λ _{for a security} parameterλif we take the parameter`0 to satisfy

`0 > 3(λ+ logd)

logd+ 2 logd0 −

2. (4)

For example, whenλ= 80 andd≈d0 ≈210, then`0 is about 8. Therefore, one can recover a set

from the given polynomial represented by our suggestion without negligible failure probability in the security parameter.

Computational Complexity Let us count the computational cost of our representation. The encoding phase consists of two steps: (1) the CRT computation per each element to obtain a value of the encoding function ι and (2) the polynomial expansion. The first step requires

O(dlog2σ) bit operations for d elements and the second step requires O(d2_{) multiplications.}

Hence, the complexity for the encoding phase isO(d2_{) multiplications.}

The decoding phase may be divided into three steps: (1) finding roots of a polynomial f

(9)

and (3). These steps require O(¯`d1.5_{) multiplications,} _O_(¯_`d_log_d_{) bit operations, and} _O₍_`0_d₎

hash computations, respectively. Hence, the complexity for the decoding phase is dominated by

O(¯`d1.5_{) multiplications.}

3.2 The Expected Number of Linkable Pairs

In this subsection, we analyze the expected number of linkable pairs of ¯`-tuple when we recover a set from a polynomial of degree d, represented by our suggestion.

A set of κ elements s(1)_j ,· · ·, s_j(κ) ∈ Zqj is called a κ-collision if their last 2τ-bits are the

same. Since (s(_ji₋)₁, s_j(i)) is trivially a linkable pair for 1 _≤ i _≤ κ, κ-collision causes at least κ

linkable pairs. Assume thatS =_{s1, . . . , sd}is a uniformly and randomly chosen set in the set of subsets of cardinalitydof the set{0,1}τ ` _and_h _and_h

j’s utilized in the encoding functionιare uniform hash functions. Then, we easily obtain the following observations, which are evidences of the expected number of linkable pairs of ¯`-tuple is not large.

1. The probability that at least one 2-collision occurs in Zqj is less than 1

2 by the birthday

paradox.

2. The probability that at least one κ-collision occurs for κ ≥3 in Zqj is at most 1 4d ≈

1 2(2+τ) of the probability that at least one (κ₋1)-collision occurs from [30, Theorem 2].

3. The κ-collision in Zqj yields κ

2 _{more candidates of roots of the polynomial} _f_{, not 2}κ can-didates. More concretely, assume that κ-collision {s(1)_j ,· · ·, s_j(κ)} occurs. Then s(1)_j can be combined with κ candidates{s(1)_j₊₁,· · ·, s(_jκ₊₁) }. Hence κ2 _{linkable pairs are generated.}

The expected number of 2-collision in Zqj for all j is roughly ¯

`

2 and the expected number

of κ-collision in Zqj for κ ≥ 3 is negligible. Theorem 1 gives a rigorous analysis of the upper

bound of the expected number of linkable pairs of ¯`-tuple.

Theorem 1. Assume thatS=_{s1, . . . , sd}is a uniformly and randomly chosen set in the set of

subsets of cardinality dof the set_{0,1_}τ `_{. Define an encoding function}_ι_:_{₀_,₁_}τ `_→

Zσ so that

ι(si) is the unique element in Zσ satisfying ι(si) ≡ si,j||si,j+1||si,j+2modqj for all 1 ≤ j ≤ `¯

whensi =si,1||. . .||si,` and si,j’s areτ-bit. Assumeh and hj’s utilized in the encoding function

ι are uniform hash functions. Then the expected number of linkable pairs of `¯-tuple is at most

3d for a polynomialfS =Qsi∈S(x−si).

Proof. Let Ej be the expected number of linkable pairs ofj-tuple in Zq1× · · · ×Zqj for j≥2.

For 1_≤ j _≤j0 _≤_`_¯_{, let} _S

j0₋_j₊₁(i_j, . . . , i_j0) be the event that (s(ij) j , . . . , s

(i_j0)

j0 ) is a linkable pair. Then,

E2=

X

i1,i2∈{1,...,d}

1_·Pr[S2(i1, i2)]

= X

i1,i2∈{1,...,d}

Pr[S2(i1, i2)∧(i1 =i2)] +

X

i1,i2∈{1,...,d}

Pr[S2(i1, i2)∧(i1 6=i2)]

= X

i1∈{1,...,d}

Pr[S2(i1, i1)] +

X

i16=i2∈{1,...,d}

Pr[S2(i1, i2)]

=d+d(d₋1) 1 22τ =d

1 +d−1 22τ

since Pr[S2(i1, i1)] = 1 fori1 ∈ {1, . . . , d} and Pr[S2(i1, i2)] = ₂12τ for distinct i1, i2 ∈ {1, . . . , d}

from the equation (1).

Now, we consider the relation betweenEj andEj+1. When (s(1i1), . . . , s (ij)

j ) is a linkable pair, consider the case that (s(₁i1), . . . , s_j(ij), s(ij+1)

j+1 ) is a linkable pair. One can classify this case into

(10)

1. ij+1=ij,

2. (ij+1 6=ij) ∧ (ij+1=ij−1),

3. (ij+1 6=ij) ∧ (ij+16=ij−1).

At the first case, ifij+1 =ijand (s(1i1), . . . , s (ij)

j ) is a linkable pair, then (s

(i1) 1 , . . . , s

(ij)

j , s

(ij+1) j+1 )

is always a linkable pair. Hence,

E(1)_j₊₁:= X i1,...,ij+1∈{1,...,d}

Pr [Sj+1(i1, . . . , ij, ij+1)∧(ij+1=ij)]

= X

i1,...,ij∈{1,...,d}

Pr [Sj(i1, . . . , ij)] =Ej.

At the second case, ifij+1 =ij−1 =6 ij and (s(1i1), . . . , s (ij)

j ) is a linkable pair, then the relation

sij−1,j+1 =sij,j+1 =sij+1,j+1 is satisfied from the encoding rule ofι. Hence,

E(2)_j₊₁:= X i1,...,ij+1∈{1,...,d}

Pr [Sj+1(i1, . . . , ij, ij+1)∧(ij+1=ij−16=ij)]

= X

i1,...,ij+1∈{1,...,d}

ij+1=ij−16=ij

Pr [Sj(i1, . . . , ij)∧S2(ij, ij+1)]

= X

i₁_,...,ij₊₁∈{1,...,d}

ij+1=ij−16=ij

PrSj(i1, . . . , ij)∧(sij,j+1||sij,j+2=sij+1,j+1||sij+1,j+2)

= X

i₁_,...,ij₊₁∈{1,...,d}

ij+1=ij−16=_ij

PrSj(i1, . . . , ij)∧(sij,j+2 =sij+1,j+2)

= X

i₁_,...,ij₊₁∈{1,...,d}

ij+1=ij−16=ij

Pr [Sj(i1, . . . , ij)] Pr

sij,j+2 =sij+1,j+2

= 1

2τ

X

i₁_,...,ij₊₁∈{1,...,d}

ij+1=ij−16=_ij

Pr [Sj(i1, . . . , ij)]

≤ 1

2τ

X

i1,...,ij∈{1,...,d}

Pr [Sj(i1, . . . , ij)] = 1 2τEj.

At the last case, we can obtain the following result:

E(3)_j₊₁:= X i1,...,ij+1∈{1,...,d}

Pr [Sj+1(i1, . . . , ij, ij+1)∧((ij+1 6=ij) ∧ (ij+16=ij−1))]

= X

i1,...,ij+1∈{1,...,d}

ij+1∈{/ ij−1,ij}

Pr [Sj(i1, . . . , ij)∧S2(ij, ij+1)]

= X

i1,...,ij+1∈{1,...,d}

ij+1∈{/ _ij₋₁_,ij}

PrSj(i1, . . . , ij)∧(sij,j+1||sij,j+2 =sij+1,j+1||sij+1,j+2)

= X

i₁_,...,ij₊₁∈{1,...,d}

ij+1∈{/ ij−1,ij}

Pr [Sj(i1, . . . , ij)] Pr

sij,j+1||sij,j+2=sij+1,j+1||sij+1,j+2

≤ d−1

22τ

X

i1,...,ij∈{1,...,d}

Pr [Sj(i1, . . . , ij)] =

(11)

From the above results, we obtain the recurrence formula of Ej as follows:

Ej+1=E(1)j+1+E (2)

j+1+E (3)

j+1

≤

1 + 1 2τ +

d₋1 22τ

Ej

forj_≥2 and hence

E_`¯≤d

1 + 1 2τ +

d₋1 22τ

_`¯−1

sinceE2 =d 1 +d₂−2τ1

≤d 1 + 1 2τ +d₂−2τ1

. Now, we show that ¯`≤ 22τ

2τ₊_d. From the parameter setting, it is satisfied that ¯`≤min{d,

blogNc−2 3τ }. When d0 ≥8d, it holds

min

d,blogNc −2

3τ

≤d_≤ d 1/3 0 d2/3

2 .

Consider the case that d0 <8d. Then, it also holds

min

d,blogNc −2

3τ

≤ blogN₃_τc −2 ≤ ₃d_τ0 ≤ d 1/3 0 d2/3

2 sinceτ ≥3. Hence

¯

`_≤min

d,blogNc −2

3τ

≤ d 1/3 0 d2/3

2 ≤

d2 0d

2/3

2d0 ≤

22τ 2τ₊_d since 2d0>2τ+d. Therefore we obtain the following result:

E_`¯≤d

1 + 1 2τ +

d₋1 22τ

_`¯₋₁

< ed <3d,

where e_≈2.718 is the base of the natural logarithm. In other words, the upper bound of the

expected number of linkable pairs of ¯`-tuple is 3d.

4 Applications to Set Operations

In this section, we present our set union protocols based on our polynomial representation described in Section 3. Our construction exploits the NS AHE scheme to encrypt a rational function whose denominator corresponds to a participant’s set. For this, we generalize a reversed Laurent Series presented in [25] to work on Zσ with a compositeσ, which is the domain of the NS scheme.

We also explain how to modify our polynomial representation for applying to construct multi-set union protocol and a set intersection protocol in which decryptors are different from set contributors.

4.1 Transforming Our Representation into a Rational Function using Reversed Laurent Series

To construct constant round privacy-preserving set union protocols, we adopt rational function representations presented in [25]. In the rational function representation, each participant Pi represents his own set Si of elements as a rational function _f1

Si where fSi :=

Q

sj∈S(x−sj). It

gives the relation

r1 fS1

+ r2

fS2

+_{· · ·}+ rn

fSn

= u

(12)

for random polynomials ri’s and some polynomial u. Then each participant tries to recover

f(x) = lcm(fS1, . . . , fSn) from a polynomialF(x) =

u

lcm(fS₁,...,fSn).

To represent a set as a rational function, the authors in [25] exploit a reversed Laurent series (RLS). We briefly introduce a RLS. (Refer to [25] for details.)

For a positive integer q, a reversed Laurent series (RLS) over Zq is a singly infinite, formal sum of the formf(x) =Pm_i₌_−∞f[i]xi ₍_f_[_m_]₆_{= 0) with an integer}_m _and _f_[_i_]_∈

Zq for all i. We define the degree off by mand it is denoted by degf. For a RLSf(x), givend1≤d2 ≤degf,

we denotef(x)[d1,d2]=

Pd2

i=d1f[i]xi. For a rational functionf /gwithf, g∈Zq[x] andg6= 0, we definethe RLS representation of a rational function f /g by a reversed Laurent series off /g.

When q is a prime, the RLS representation has the following properties:

– The RLS representation for a given rational function is unique.

– Let f, g be polynomials in Zq[x] with degf < degg and g 6= 0. Then there exists an algorithm [25] to compute k(> degg) high-order terms of the RLS representation of f /g. We describe this algorithm in Algorithm 1. RationalToRLS(f, g, k) denotes the output of RationalToRLS algorithm which takes polynomialsf, g and integer k.

Algorithm 1 RationalToRLS(f, g, k)

Input: f(x), g(x)∈Zq[x] with degf <deggand an integerk >degg

Output: khigher-order terms of the RLS representation of a rational functionf /g

1: F(x)←f(x)·xk

2: Compute Q(x), R(x) such that F(x) = g(x)Q(x) +R(x) and degR < degg using a polynomial division algorithm

3: returnQ(x)·x−k

– When 2k high-order terms of the RLS representation of a rational function f /g such that

f, g_∈Zq[x],g6= 0, and degf <degg≤kare given, there exists an efficient algorithm [29, Section 17.5.1] to recover two polynomials v(x), u(x) inZq[x] such that _uv = f_g inZq(x) and gcd(v, u) = 1.

In our protocol, we will represent each participant’s set Si as our polynomial representation

fSi :=

Q

sj∈Si(x−ι(sj)) ∈ Zσ[x] with our encoding function ι. Then we convert a rational

function of 1/fSito its RLS overZσ. SinceZσ is not a Euclidean domain, one may doubt whether

the RationalToRLS algorithm works on Zσ[x]. However, in our protocol, since the conversion requires polynomial divisions only by monic polynomials, the RationalToRLS algorithm works well on Zσ[x].

After the end of interactions among participants in our protocol, each participant obtains the 2nk higher-order terms of the RLS representation of a rational function _Uu(₍x_x)₎ whereU(x) = lcm(fS1(x), . . . , fSn(x)). There is no algorithm to recover u

0₍_x_{) and} _U0₍_x_{) in}

Zσ[x] such that u(x)

U(x) =

u0₍_x₎

U0₍_x₎. However, from our polynomial representation, it only requires U0(x) modqj for each j and we can obtain U0₍_x_{) mod}_q

j from the RLS representation modulo qj by running polynomial recovering algorithm on Zqj[x]’s.

The following lemma guarantees that, in a polynomial ring Zσ[x], a modular operation by a prime divisor q of σ and RationalToRLS algorithm are commutative. This will be utilized to prove the correctness of our protocol.

Lemma 1. Let f, g be polynomials in Zσ[x] with degf < degg and g 6= 0. Suppose that the

(13)

Proof. Let RationalToRLS(f, g, k) = Q(x)x−k _where _xk_f₍_x_{) =} _Q₍_x₎_g₍_x_{) +}_R₍_x_{) in}

Zσ[x] with

R = 0 or degR < degg. For each polynomial p(x) in Zσ[x], denote p(x) modq by pq(x). Then xk_f

q(x) = Qq(x)gq(x) +Rq(x) in Zq[x], where Rq = 0 or degRq ≤ degR < degg = deggq. Since the division algorithm uniquely outputs the quotient and the remainder in Zq[x], RationalToRLS(f modq, gmodq, k) =Qq(x)x−k≡Q(x)x−kmodq. The following lemma gives an information on the distribution ofuj(x) :=u(x) modqj in our protocol which is inevitable to prove the security of our set union protocol. It guarantees that the distributions of uj(x) andu(x) are uniformly distributed among polynomials in the set of polynomials having degree at most deg(lcm(fS1, . . . , fSn))−1 inZqj[x] andZσ[x], respectively.

The proof of Lemma 2 is given in [25].

Lemma 2 ([25, Lemma 1]). Let fS1(x), . . . , fSn(x) ∈ Zq[x] be polynomials of degree k ≥

1 for a prime q. Suppose r1(x), . . ., rn(x) are polynomials in Zq[x], chosen uniformly and

independently in the set of polynomials of degree at most k₋1. Let u(x) be a polynomial such that

u(x)

lcm(fS1(x), . . . , fSn(x))

= n

X

i=1 ri(x)

fSi(x)

. (5)

Then u(x) is uniformly distributed among polynomials in the set of polynomials∈Zq[x]having

degree at most deg(lcm(fS1(x), . . . , fSn(x)))−1.

Finally, to recover the exact U(x) = lcm(fS1, . . . , fSn) from the rational function

u(x)

U(x),

the relation gcd(u(x), U(x)) = 1 is to be satisfied. In our set union protocol, since uj(x) :=

u(x) modqj is uniformly distributed in Zqj[x] and the expected number of roots of a random

polynomial is one [19], we may expect that our RLS representation fails to output all the elements in the set union with probability _qd

j ≈ 2

−2τ_{. Furthermore, it can be detected if this} happens for a certainj, whose probability is about 1₋(1₋ d

qj) ¯

` _≈ `d¯ qj ≤2

−τ_{. It is not negligible,} but small enough.

4.2 Set Union for Honest-But-Curious Case

Threshold Naccache-Stern Encryption For a group decryption, it requires a semantically secure, threshold NS AHE scheme in our protocol. We provide a threshold version of the NS encryption scheme in Appendix B. Our construction is based on the technique of Fouque et al. [8], which transforms the original Paillier homomorphic encryption scheme into a threshold version working from Shoup’s technique [28].

Parameter Setting Let U be the universe, n be the number of participants, and k be the maximum size of participants’ datasets. Let dbe the possible maximum size of the set union,

i.e., d = nk. Take the bit size of N by considering the security of the threshold NS AHE scheme, which is the modulus of the threshold NS AHE scheme. Put d0 = max{d,dlogNe}

and τ = 1₃(logd+ 2 logd0). Set ` so that U ⊆ {0,1}τ `, a proper size of `0 so that `0 satisfies

the relation (4) and let ¯` = `+`0. Note that ¯` is to be smaller than minnd,b_{3 log log}logNc−_N2o since

τ _≥log logN. Generate the parameters of the threshold NS encryption scheme, including the size of message spaceσ, which is a product of ¯`(3τ + 1)-bit distinct primesqi’s.

Our Set Union Protocol for Honest-But-Curious Case Our set union protocol against honest-but-curious (HBC) adversaries is described in Figure 3. In our set union protocol, each par-ticipant computes the 2nk higher-order terms of the RLS representation of FSi =

1

f_Si =

1 Q

si,j∈_Si(x−ι(si,j)) ∈ Zσ[x] for our encoding function ι and sends its encryption to all

(14)

Input: There are n ≥ 2 HBC participantsPi with a private input setSi ⊆ U of car-dinality k. Set d=nk. The participants share the secret key sk, to which pkis the corresponding public key to the threshold NS AHE scheme. Letι:{0,1}∗_→

Zσ be the encoding function provided in Section 3.

Each participantPi,i= 1, . . . , n:

1. (a) constructs the polynomial fSi(x) =

Q

si,j∈Si(x− ι(si,j)) ∈ Zσ[x], runs

RationalToRLS(1, fSi,(2n+ 1)k−1) to obtain

1

f_Si(x)

[−(2n+1)k+1,−k]

, and

computesFSi(x) =

1

f_Si(x)

[−(2n+1)k+1,−k]·x

(2n+1)k−1

.

(b) computes ˜FSi, the encrypted polynomial ofFSi, and sends ˜FSi to all other

participants.

2. (a) chooses random polynomialsri,j(x) ∈Zσ[x] of degree at mostk for all 1≤

j≤n.

(b) computes the encryption, ˜φi, of the polynomial φi(x) =Pn

j=1FSj ·ri,j and

sends it to all participants.

3. (a) calculates the encryption of the polynomialF(x) =Pn i=1φi(x).

(b) performs a group decryption with all other players to obtain the 2nk higher-order terms ofF(x).

4. (a) recovers a polynomial pair of uj(x) and Uj(x) in Zqj[x] for all 1 ≤ j ≤ ¯`

such thatuj(x)

Uj(x)

[−2nk,−1]= (F(x) modqj)[k−1,(2n+1)k−2]·x

−(2n+1)k+1

and

gcd(uj(x), Uj(x)) = 1 in Zqj[x], using the 2nk higher-order terms of F(x)

obtained in Step 3 (b).

(b) extracts all roots ofUj(x) inZqj[x] for alljusing a factorization algorithm.

(c) determines the set union using the encoding rule ofι.

Fig. 3.PPSU-HBC protocol in the HBC case

polynomial ri,j using additive homomorphic property, which is a randomly chosen polyno-mial by the participant _Pi and adds all the resulting polynomials to obtain the encryption of

φi(x) =Pnj=1FSj·ri,j. Then, he sends the encryption of φi(x) to all others. After interactions

among participants, each participant can obtain the 2nk high-order term of the RLS representa-tion ofF(x) =Pn_i₌₁Pn_j₌₁ 1

f_Sj ·ri,j

∈Zσ[x]. Then each participant obtains the 2nkhigh-order terms of the RLS representation of F in Zσ[x] with group decryption and recovers polynomi-alsuj(x) andUj(x) such that

_u_j(_x₎

Uj(x)

[−2nk,−1]= (F(x) modqj)[k−1,(2n+1)k−2]·x

−(2n+1)k+1 _and

gcd(uj(x), Uj(x)) = 1 inZqj[x] from these values. Thereafter, each participant extracts all roots

of Uj(x) over Zqj for eachj and recovers all elements based on criteria of our representation.

Analysis Now, we consider the correctness and privacy of our proposed protocol described in Figure 3. The following theorems guarantee the correctness and privacy of our construction in Figure 3.

Theorem 2. In the protocol described in Figure 3, every participant learns the set union of private inputs participating players, with high probability.

Proof. After Step 3 (b), all participants obtain the 2nkhigher-order terms ofF(x) =Pn_i₌₁Pn_j₌₁ 1

f_Sj ·ri,j

∈

Zσ[x] and hence they obtain the 2nk higher-order terms of the RLS representation ofF(x) mod

qj. From these values, using polynomial recovering algorithm, they reconstruct polynomials

uj(x) and Uj(x) such that _Uuj(_j(x_x)₎ ≡ Fj(x) modqj and gcd(uj(x), Uj(x)) = 1. From the equa-tion (4) and Lemma 1, Uj = (lcm(fS1, fS2, . . . , fSn) modqj) with high probability. Since our

polynomial representation can give the exact corresponded set with overwhelming probability,

(15)

Theorem 3. Assume that the utilized additive homomorphic encryption scheme is semantically secure. Then, in our set union protocol for the HBC case described in Figure 3, any adversary Aof colluding fewer thannHBC participants learns no more information than would be gained by using the same private inputs in the ideal model with a trusted third party.

Proof. Since the utilized additive homomorphic encryption scheme is semantically secure, each participant learns onlyF(x) =Pn_j₌₁(Pn_i₌₁ri,j)FSj inZσ[x]. All players contribute to generate

the polynomial Pn_i₌₁ri,j and the polynomial Pni=1ri,j is uniformly distributed and unknown. Moreover, the resulting polynomials uj are uniformly distributed by Lemma 2. Hence, no in-formation can be recovered from the polynomial F, Uj’s and uj’s, other than those given by

revealing the union set.

Performance Analysis It is clear that our protocol runs in O(1) rounds. Let us count the computational and communicational costs for each participant.

Step 1 (a): requires ˜O(k) multiplications in Zσ for a polynomial expansion of degree k and

O(kd) multiplications to run the RationalToRLSalgorithm and compute FSi.

Step 1 (b): requires O(d) exponentiations for 2dencryptions and O(nd) communication costs. Step 2 (b): requires O(d2_{) exponentiations for computing the encryption ˜}_φ

i:=Pn_j₌₁F˜Sj·ri,j

using additive homomorphic property andO(nd) communication costs. Step 3 (a): requires O(nd) multiplications for computingPn_i₌₁φ˜i.

Step 3 (b): requiresO(d) exponentiations for decryption share computation for 2dciphertexts and O(¯`√dqi) multiplications for solvingdDLPs for ¯`groups of orderqj’s.4 The communi-cation cost is O(nd).

Step 4 (a): requires O(d2_{) multiplications in}

Zqj to recover Uj(x) using extended Euclidean

algorithm for each j.

Step 4 (b): requires O(d1.5+o(1)_{) multiplications in}

Zqj for each j to factor a polynomial of

degree d.

Step 4 (c): requires O(¯`dlogdlogqj) bit operations for sorting andO(d) hash computations. Then the computational complexity is dominated by one of termsO(d2_{) exponentiations in}

Step 2 (b) and O(¯`√dqi) multiplications in Step 3 (b). Since one modular exponentiation for a modulusN requires O(logN) multiplications and ¯` <minnd,b_{3 log log}logNc−_N2o, the computational complexity for each participant is dominated by O(d2_{) =} _O₍_n2_k2_{) exponentiations in}

ZN and the total complexity is O(n3_k2_{) exponentiations in}

ZN. The total communication cost for our protocol is O(n2_d_{) =}_O₍_n3_k_{) (log}_N_{)-bit elements.}

For the malicious case, we can obtain the set union protocol using techniques in [17] and [25]. Refer to Appendix C.1 for the details in the malicious case.

4.3 Extend to Multi-set Union Protocol

We can easily extend our set union protocol to a multi-set union protocol by modifying our encoding function. Assume that each participant _Pi has a multi-set Si ⊆ U for the known universe _{U ⊆ {}0,1_}τ `_{. Define a function} _η _: _{U → U}00 _{⊆ {}₀_,₁_}τ(`+`00₎

by η(s) = s_||r where r

is a randomly chosen element in {0,1}τ `00_{. Then each participant takes part in our set union} protocol with a set _{η(s1), . . . , η(sk)} as his set instead of {s1, . . . , sk}. For the same messages

s1ands2, ifη(s1) is different fromη(s2), one can obtainη(s1) andη(s2) as a part of a set union,

so the frequency ofs1 in the union can be revealed. Hence, if all values ofη are distinct, we can

learn the multi-set union.

4_{Note that one has to solve ¯}_`_{DLPs over a group of order}_q

i for one decryption in the NS encryption scheme. In Step 3 (b), one has to solve 2d = 2nk DLPs over a group of order qi for each qi. It requires O(

√

dqi) multiplications to solve d DLPs over a group of order qi [18] and hence total complexity of this step is

(16)

Consider the probability that there exist at least two same values among d values of func-tion η. This probability is

1₋

1₋ 1 2τ `00

· · ·

1₋d−1 2τ `00

≈ d 2

2τ `00

and it is less than 2−λ _if _`00 _> λ+ 2 logd−1

logd . For example, when λ = 80 and d ≈ d0 ≈ 2 10_,

then`00 is about 10.

Both computational and communicational complexities of our multi-set union protocol are the same with those of our set union protocol as big-O notation. It is heavier than the previous best result [14], which requires O(n2_k_{) exponentiations in}

Fq and O(n2klogq) bits where q is similar to the size of the universe. However, the public key size of our protocol isO(1) elements, while that of the previous result in [14] is O(d) elements for the size d of the multi-set union since their construction utilized ElGamal encryption schemes defined over an extension fieldFpd

of extension degreed.

4.4 Set Intersection Protocol

In most private set intersection protocols based on polynomial representations and AHE schemes, it is hard to factor the resulting polynomial corresponded to the intersection of datasets. Hence, it is assumed that the decryptors who want to obtain the set intersection possess a set having the resulting set as a subset and evaluate at elements of his set to check whether each element is a root of the resulting polynomial. However, our polynomial representation provides a private set intersection protocol in which the decryptors efficiently recover the resulting set without a superset of the resulting set, except the universe, since it enables us to factor the resulting polynomial.

5 Conclusion

In this paper, we provided a new representation of a set by a polynomial over Zσ, which can be efficiently inverted by finding all the linear factors of a polynomial whose root locates in the image of our encoding function, when the factorization of σ is public. Then we presented an efficient constant-round set union protocols, transforming our representation into a rational function and then combining it with threshold NS AHE scheme. We also extend our set union protocol to the multi-set union protocol by modifying rational function representation and consider the set intersection protocol in the case that decryptors do not possess a superset of the resulting set except the universe.

We showed that our encoding function is quite efficient on average-case, but it still requires exponential time in the degree of a polynomial to recover a set from the polynomial represented by our encoding function at worst-case although the probability of the worst-case is sufficiently small. Hence it would be interesting to construct an encoding function that enables us to recover a set in polynomial time even at worst-case.

References

1. M. Ben-Or, S. Goldwasser, and A. Wigderson. Completeness theorems for non-cryptographic fault-tolerant distributed computation (extended abstract). In J. Simon, editor,ACM Symposium on Theory of Comput-ing (STOC) 1988, pages 1–10. ACM, 1988.

(17)

3. J. Camenisch and V. Shoup. Practical verifiable encryption and decryption of discrete logarithms. In D. Boneh, editor,Advances in Cryptology - CRYPTO 2003, volume 2729 ofLNCS, pages 126–144. Springer, 2003.

4. J. Camenisch and M. Stadler. Proof systems for general statements about discrete logarithms. Technical Report, No. 260, March 1997, 1997.

5. R. Cramer, I. Damg˚ard, and J. B. Nielsen. Multiparty computation from threshold homomorphic encryption. In B. Pfitzmann, editor,EUROCRYPT 2001, volume 2045 ofLNCS, pages 280–299. Springer, 2001. 6. E. D. Cristofaro, J. Kim, and G. Tsudik. Linear-complexity private set intersection protocols secure in

malicious model. In M. Abe, editor,ASIACRYPT 2010, volume 6477 of LNCS, pages 213–231. Springer, 2010.

7. E. D. Cristofaro and G. Tsudik. Practical private set intersection protocols with linear complexity. In R. Sion, editor,Financial Cryptography (FC) 2010, volume 6052 ofLNCS, pages 143–159. Springer, 2010.

8. P.-A. Fouque, G. Poupard, and J. Stern. Sharing decryption in the context of voting or lotteries. In Y. Frankel, editor,Financial Cryptography (FC) 2000, volume 1962 ofLNCS, pages 90–104. Springer, 2001.

9. M. J. Freedman, K. Nissim, and B. Pinkas. Efficient private matching and set intersection. In C. Cachin and J. Camenisch, editors,EUROCRYPT 2004, volume 3027 ofLNCS, pages 1–19. Springer, 2004.

10. K. B. Frikken. Privacy-preserving set union. In J. Katz and M. Yung, editors,Applied Cryptography and Network Security (ACNS) 2007, volume 4521 ofLNCS, pages 237–252. Springer, 2007.

11. J. Furukawa and K. Sako. An efficient scheme for proving a shuffle. In J. Kilian, editor,CRYPTO 2001, volume 2139 ofLNCS, pages 368–387. Springer, 2001.

12. O. Goldreich. Foundations of Cryptography: Volume I Basic Tools. Cambridge University Press, 2001. 13. O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game or a completeness theorem for

protocols with honest majority. In A. V. Aho, editor, ACM Symposium on Theory of Computing (STOC) 1987, pages 218–229. ACM, 1987.

14. J. Hong, J. W. Kim, J. Kim, K. Park, and J. H. Cheon. Constant-round privacy preserving multiset union. IACR Cryptology ePrint Archive, 2011. Available at http://eprint.iacr.org/2011/138.

15. S. Jarecki and X. Liu. Efficient oblivious pseudorandom function with applications to adaptive ot and secure computation of set intersection. In O. Reingold, editor,Theory of Cryptography (TCC) 2009, volume 5444 ofLNCS, pages 577–594. Springer, 2009.

16. J. Katz and R. Ostrovsky. Round-optimal secure two-party computation. In M. K. Franklin, editor,CRYPTO 2004, volume 3152 ofLNCS, pages 335–354. Springer, 2004.

17. L. Kissner and D. Song. Privacy-preserving set operations. In V. Shoup, editor, Advances in Cryptology-CRYPTO 2005, volume 3621 ofLNCS, pages 241–257, 2005.

18. F. Kuhn and R. Struik. Random walks revisited: Extensions of pollard’s rho algorithm for computing multiple discrete logarithms. In S. Vaudenay and A. M. Youssef, editors,Selected Areas in Cryptography (SAC) 2001, volume 2259 ofLNCS, pages 212–229. Springer, 2001.

19. V. K. Leont’ev. Roots of random polynomials over a finite field. Matematicheskie Zametki, 80(2):313–316, 2006.

20. P. D. MacKenzie and K. Yang. On simulation-sound trapdoor commitments. In C. Cachin and J. Camenisch, editors,EUROCRYPT, volume 3027 ofLNCS, pages 382–400. Springer, 2004.

21. D. Naccache and J. Stern. A new public key cryptosystem based on higher residues. In L. Gong and M. K. Reiter, editors,ACM Conference on Computer and Communications Security (ACM CCS) 1998, pages 59–66. ACM, 1998.

22. T. Okamoto and S. Uchiyama. A new public-key cryptosystem as secure as factoring. In K. Nyberg, editor, Advances in Cryptology - EUROCRYPT 1998, volume 1403 ofLNCS, pages 308–318. Springer, 1998. 23. P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. In J. Stern, editor,

Advances in Cryptology - EUROCRYPT 1999, volume 1592 ofLNCS, pages 223–238. Springer, 1999. 24. Y. Sang and H. Shen. Efficient and secure protocols for privacy-preserving set operations.ACM Transactions

on Information and System Security, 13(1), 2009.

25. J. H. Seo, J. H. Cheon, and J. Katz. Constant-round multi-party private set union using reversed Laurent series. In M. Fischlin, editor,Public Key Cryptography (PKC) 2012, LNCS. Springer, 2012. To appear. 26. A. Shamir. How to share a secret. Communications of the ACM, 22(11):612–613, 1979.

27. A. Shamir. On the generation of multivariate polynomials which are hard to factor. In S. R. Kosaraju, D. S. Johnson, and A. Aggarwal, editors,ACM Symposium on Theory of Computing (STOC) 1993, pages 796–804. ACM, 1993.

28. V. Shoup. Practical threshold signatures. In B. Preneel, editor,EUROCRYPT 2000, volume 1807 ofLNCS, pages 207–220. Springer, 2000.

29. V. Shoup. A Computational Introduction to Number Theory and Algebra. Cambridge University Press, 2nd edition, 2009.

30. K. Suzuki, D. Tonien, K. Kurosawa, and K. Toyota. Birthday paradox for multi-collisions. In M. S. Rhee and B. Lee, editors,Information Security and Cryptography (ICISC) 2006, volume 4296 ofLNCS, pages 29–40. Springer, 2006.

31. C. Umans. Fast polynomial factorization and modular composition in small characteristic. In C. Dwork, editor,ACM Symposium on Theory of Computing (STOC) 2008, pages 481–490. ACM, 2008.

(18)

A The Proper Size of α

In this section, we explain why the proper size of α is 1₃. Let α = b_a for relatively prime a, b

with a < b and assume that ` and `0 are divided by b. Let h : {0,1}∗ _{→ {}₀_,₁_}3(1−α)τ _and

hi : {0,1}∗ → {0,1}3ατ be uniform hash functions for 1 ≤ i ≤ `0. 5 Assume that 3ατ and 3(1₋α)τ are integers. Parse a messagesi∈ U ⊆ {0,1}3ατ ` into`blockssi,1,· · · , si,`of 3ατ-bit so that si =si,1|| · · · ||si,`. Let si,`+j =hj(si) for 1≤j≤`0 and parse h(sj) into (a−b) blocks

s_i,¯_`₊₁,· · · , s_i,_`¯₊_a₋_b of 3ατ-bit. Set ¯`=`+`0.

We define an encoding function ια : U ⊆ {0,1}3ατ ` → Zσ, in which ια(si) is the unique element inZσsatisfyingια(si) =si,(j−1)b+1|| · · · ||si,(j−1)b+amodqj for a compositeσ=Q

¯

` j=1qj.

Then a set S is represented as a polynomial f(x) =Q_s

i∈S(x−ια(si))∈Zσ[x].

For each messagesi=si,1|| · · · ||si,`, denoteια(si) modqj bys(ji). We generalize the definition of a linkable pair. We define (s(_ji), s(_ji₊₁0))∈Zqj×Zqj+1to be alinkable pair if the last 3(1−α)τ-bit of s(_ji) is equal to the first 3(1−α)τ-bit of s(_ji₊₁0). Inductively, we also define (s(₁i1),· · ·, s(ij+1)

j+1 )∈

Zq1× · · · ×Zqj+1 to be a linkable pair if (s

(i1) 1 ,· · ·, s

(ij)

j ) and (s

(ij)

j , s

(ij+1)

j+1 ) are linkable pairs.

Let ια(si) and ια(si0) be images of elememts s_i and s_i0 of the encoding function ι_α with

si6=si0. Assume thats_i ands_i0 are uniformly chosen strings from {0,1}3ατ `. Then,

Pr [(s(_ji), s(_ji0)) is a linkable pair] (6)

= Prsi,jb+1|| · · · ||si,(j−1)b+a=si0_,jb+1|| · · · ||s_i0_,₍_j₋₁₎_b₊_a (7)

= 1

23(1−α)τ (8)

for a fixed 1≤j≤`¯.

At the decoding phase, when a polynomial f(x) =Qd_i₌₁(x₋ια(si)) is given, one computes all the roots _{s(1)_j ,_{· · ·}, s_j(d)_} over Zqj. Then for each j sequentially from 1 to ¯`−1, we find all

linkable pairs between {s(1)_j ,· · ·, s(_jd)} and {s(1)_j₊₁,· · · , s(_jd+1}) and check the hash values to find the correct d si’s.

Theorem 4. Assume that S = _{s1, . . . , sd} is a uniformly and randomly chosen set in the

set of subsets of cardinality d of the set _{0,1}3ατ ` _for ₀ _{< α <} ₁_{. Define an encoding}

func-tion ια : {0,1}3ατ ` → Zσ so that ια(si) is the unique element in Zσ satisfying ια(si) ≡

s_i,₍_j−1)b+1|| · · · ||si,(j−1)b+amodqj for all 1 ≤ j ≤ `¯ when si = si,1|| · · · ||si,`, si,j’s are 3ατ

-bit and α = _ab for relatively prime a, b. Assume that ` and `0 are divided by b. Assume h and hj’s utilized in the encoding function ι are uniform hash functions.

The expected number of linkable pairs of `¯-tuple is at most

d

1 + 3

2min{3ατ,(2−3α)τ} _`¯₋₁

for a polynomial f(x) =Qd_i₌₁(x−ια(si)).

Proof. Let Ej be the expected number of linkable pair of j-tuple inZq1 × · · · ×Zqj for j ≥2.

For 1 ≤j ≤j0 ≤ `¯, let Sj0₋_j₊₁(i_j, . . . , i_j0) be the event that (s(ij) j , . . . , s

(i0

j)

j0 ) is a linkable pair.

5

(19)

Then,

E2=

X

i1,i2∈{1,...,d}

1_·Pr[S2(i1, i2)]

= X

i1,i2∈{1,...,d}

Pr[S2(i1, i2)∧(i1 =i2)] +

X

i1,i2∈{1,...,d}

Pr[S2(i1, i2)∧(i1 6=i2)]

= X

i1∈{1,...,d}

Pr[S2(i1, i1)] +

X

i16=i2∈{1,...,d}

Pr[S2(i1, i2)]

=d+d(d₋1) 1 23ατ =d

1 +d−1 23ατ

since Pr[S2(i1, i1)] = 1 fori1 ∈ {1, . . . , d}and Pr[S2(i1, i2)] = ₂31ατ for distinct i1, i2 ∈ {1, . . . , d}

from the equation (6).

Now, we consider the relation betweenSj andSj+1. When (s₁(i1), . . . , s(_jij)) is a linkable pair,

consider the case that (s(₁i1), . . . , s_j(ij), s(ij+1)

j+1 ) is a linkable pair. One can classify this case into

the following cases:

1. ij+1=ij,

2. ij+1=ij−k∈ {/ ij−k+1, . . . , ij}fork= 0, . . . ,ba−_b1c −1, 3. ij+1=ij−k∈ {/ ij−k+1, . . . , ij}fork=ba−_b1c, . . . , j−1.

At the first case, ifij+1=ij and {s(1i1),· · ·, s (ij)

j }is a linkable pair, then{s

(i1) 1 ,· · ·, s

(ij)

j , s

(ij+1) j+1 }

is always a linkable pair. Hence,

E(1)_j₊₁:= X i1,...,ij+1∈{1,...,d}

Pr [Sj+1(i1, . . . , ij, ij+1)∧(ij+1=ij)]

= X

i1,...,ij∈{1,...,d}

Pr [Sj(i1, . . . , ij)] =Ej.

At the second case, if 0_≤k_{≤ b}a−1

b c −1,

E(2)_j₊₁:= X i1,...,ij+1∈{1,...,d}

Pr[Sj+1(i1, . . . , ij+1)∧(ij+1=ij−k∈ {/ ij−k+1, . . . , ij})]

= X

i₁_,...,ij₊₁∈{1,...,d}

ij+1=ij−k /∈{ij−k+1,...,ij}

PrSj(i1, . . . , ij)∧S0j(ij, ij+1)

≤

ba−1

b c−1

X

k=0

X

i1,...,ij∈{1,...,d}

1

23kατ Pr[Sj(i1, . . . , ij)]≤ 2 23ατEj

since the lastτ(1₋(k+1)_ab+1)-bit ofs(ij−k)

j−k and the firstτ(1−

(k+1)b+1

a )-bit ofs

(ij+1)

j+1 are already

same from the encoding rule ofιαwhenS0j(ij, ij+1) is the event thatsij,(j−k−1)b+a+1|| · · · ||sij,(j−1)b+a= sij+1,(j−k−1)b+a+1|| · · · ||sij+1,(j−1)b+a.

For ba−1

b c ≤k < j, ifij+1=ij−k∈ {/ ij−k+1, . . . , ij}, then

(20)

Hence,

E(3)_j₊₁:= X i1,...,ij+1∈{1,...,d}

Pr[Sj+1(i1, . . . , ij+1)∧(ij+1=ij−k∈ {/ ij−k+1, . . . , ij})]

= X

i₁_,...,ij₊₁∈{1,...,d}

ij+1=ij−k /∈{ij−k+1,...,ij}

Pr[Sj(i1, . . . , ij)]·Pr[S2(ij, ij+1)]

= j−1

X

k=da−1

b e

X

i1,...,ij∈{1,...,d}

1

23(1−α)τ Pr[Sj(i1, . . . , ij)]≤

d

23(1−α)τEj.

sincePj−1 k=da−1

b e 1 23(1−α)τ ≤

¯

`

23(1−α)τ ≤

d

23(1−α)τ.

From the above results, one can obtain the recurrence formula of Ej as follows:

Ej+1 =E(1)j+1+E (2)

j+1+E (3)

j+1

≤

1 + 2 23ατ +

d

23(1−α)τ

Ej

forj≥2. Therefore,

E_`¯≤

1 + 2

23ατ +

d

23(1−α)τ

_X

i1,...,i¯`−1∈{1,...,d}

Pr[S_`¯(i1, . . . , i_`¯₋₁)]

≤

1 + 2

23ατ +

d

23(1−α)τ

_`¯−2 _X

i1,i2∈{1,...,d}

Pr[S2(i1, i2)]

≤d

1 + 3

2min{3ατ,(2−3α)τ} _`¯₋₁

.

If ¯`≥2min{3ατ ,(2−3α)τ}_{, then}_d_{1 +} 3 2min{3ατ ,(2−3α)τ}

_`¯₋₁

is exponentially increased. Hence it requires the limitation of ¯`so that ¯` <2min{3ατ ,(2−3α)τ}_{. But, if}_α₆₌ 1

3, then either 3αor (2−3α)

is less than 1 and hence 2min{3ατ ,(2−3α)τ} _<_minn_d,blogNc−2 3τ

o

for increasing of dand logN. It causes the limitation of the size of universe since _{|U| ≤} 23ατ ` _{in the case} _α ₆₌ 1

3. From these

reasons, we set α so that the universe_U is a subset of _{0,1_}τ `_,_i.e._,_α₌ 1 3.

B Threshold Naccache-Stern Cryptosystem

In this section, we provide a threshold version of NS AHE scheme. Our threshold version is based on Fouque et al.’s work [8], which transforms the original Paillier homomorphic encryption scheme into a threshold version working from Shoup’s technique [28]. The security proof of our construction is similar with that of a threshold version of Paillier encryption scheme under the assumption that the original NS AHE scheme is semantically secure. We omit the detail of the security proof.

Threshold Naccache-Stern Encryption Scheme Our threshold version of the NS AHE scheme consists of the following four algorithms:

– KeyGen(λ, n, t): this algorithm is executed by a dealer. It takes as inputs a security parameter