Differentially Private Posterior Sampling

Sampling from the posterior distribution of a Bayesian model with bounded log- likelihood has free differential privacy to some extent [124]. Specifically, for prob- abilistic NBM, releasing a sample of thesimilarityS,

S∼p(S|R>0,_αS,_αR)∝_exp( M ∑ i=1 N ∑ u=1 (_rui− SiR − u |Si|I− u )2+_λ M ∑ i=1 ||Si||2) (3.9)

achieves4B-differential privacy at user level, if each user’s log-likelihood is bounded to B, i.e. max

u∈R>0 ∑

i∈Ru(ˆrui−rui)

2 ≤_{B. Wang et al. [}₁₂₄_{] showed that we can achieve}

ε-differential privacy by simply rescaling the log-posterior distribution with₄ε_B, i.e. ε

4B·logp(S|R

>0_,_αS_,_αR₎_.

Posterior sampling is computationally costly. For the sake of efficiency, we adopt a recent introduced Monte Carlo method, Stochastic Gradient Langevin Dynam- ics (SGLD) [126], as our MCMC sampler. To successfully use SGLD, we need to derive an unbiased estimator ofsimilaritygradient from a mini-batch which is a non-trivial task.

Next, we first overview the basic principles of SGLD (Section 3.4.1), then we derive an unbiased estimator of the truesimilaritygradient (Section 3.4.2), and finally present our privacy-preserving algorithm (Section 3.4.3).

3.4.1 Stochastic Gradient Langevin Dynamics

SGLD is an annealing of SGD and Langevin dynamics [102] which generates samples from a posterior distribution. Intuitively, it adds an amount of Gaussian noise calibrated by the step sizes (learning rate) used in the SGD process, and the step sizes are allowed to go to zero. When it is far away from the basin of convergence, the update is much larger than noise and it acts as a normal SGD process. The update decreases when the sampling approaches to the convergence basin such that the noise dominated, and it behaves like a Brownian motion. SGLD updates the candidate states according to the following rule.

Δθt = ηt 2(Δ logp(θt) + L L L ∑ i=1 Δ logp(xti|θt)) +zt; zt ∼ N(0,η_t) (3.10) whereη_tis a sequence of step sizes.p(x|θ)denotes conditional probability distribution, andθis a parameter vector with a prior distributionp(_θ)_._L_{is the size of a}

mini-batch randomly sampled from datasetXL. To ensure convergence to a local optimum, the following requirements of step sizeη_thave to be satisfied:

∞ ∑ t=1 η_t =∞ ∞ ∑ t=1 η2_t <∞

Decreasing step sizeη_treduces the discretization error such that the rejection rate approaches zero, thus we do not need accept-reject test. Following the previous works, e.g. [73,126], we set step sizeη_t = η₁t−ξ_{, commonly,} _ξ ∈ _[₀_.₃_,₁_]_{. In} order to speed up the burn-in phase of SGLD, we multiply the step sizeη_t by a temperature parameterϱ(0< ϱ <1) where√ϱ·η_t ≫ η_t[23].

3.4.2 Unbiased Estimator of The Gradient

The log-posterior distribution ofsimilarityShas been defined in Equation (3.5). The true gradient of thesimilaritySoverR>0_{can be computed as}

G(_R>0) = ∑

(u,i)∈R>0

gui(_S;_R>0) +_λS _(3.11)

wheregui(S;R>0_{) =} _eui∂ˆrui

∂Si. To use SGLD and make it converge to true poste-

rior distribution, we need an unbiased estimator of the true gradient which can be computed from a mini-batchΦ ⊂ R>0_{. Assume that the size of}_Φ_and_R>0_are_L

andLrespectively. The stochastic approximation of the gradient is

G(Φ) =L¯g(S,Φ) +λS◦I[i,j∈Φ] (3.12) where¯g(S,Φ) = 1

L ∑

(u,i)∈Φgui(S,Φ). I ⊂ BM×Mis symmetric binary matrix,

andI[i,j ∈ Φ] = 1if any item-pair(i,j)exists inΦ, otherwise 0. ◦presents element-wise product (i.e. Hadamard product). The expectation ofG(Φ)over all possible mini-batches is,

EΦ[G(Φ)] =EΦ[L¯g(S,Φ)] +λEΦ[S◦I[i,j∈Φ]]

= ∑

(u,i)∈R>0

EΦ[G(Φ)]is not an unbiased estimator of the true gradientG(R>0)due to the prior

termEΦ[S ◦I[i,j ∈ Φ]]. LetH = EΦ[I[i,j ∈ Φ]], we can remove this bias by

multiplying the prior term withH−1thus to obtain an unbiased estimator. Follow previous approach [4], we assume the mini-batches are sampled with replacement, thenHis, Hij =1− |Ii||Ij| L2 (1− |Ij| L)L−1(1− |Ii| L )L−1 (3.14) where|Ii|(resp. |Ij|) denotes the number of ratings of itemi(resp. j) in the com- plete datasetR>0_{. Then the SGLD update rule is the following:}

S(t+1) ←S(t)− ηt

2(L¯g(S (t)_,

Φ) +_λS(t)◦H−1) +_zt _(3.15)

3.4.3 Differential Privacy via Posterior Sampling

To construct a differentially private NBM, we exploit a recent observation that sampling from scaled posterior distribution of a Bayesian model with bounded log-likelihood can achieveε-differential privacy [124]. We summarize the differentially private sampling process (via SGLD) in Algorithm 3.

Algorithm 3Differentially Private Posterior Sampling (via SGLD)

Require: Temperature parameterϱ, privacy parameterε, regular parameterλ, ini- tial learning rateη₁. LetKlarger than burn-in phase.

1: fort=1:Kdo

2: •Randomly sample a mini-batchΦ⊂R>0. 3: ¯g(_S(t)_, Φ) = 1 L ∑ (u,i)∈Φeui ∂ˆrui ∂S(_it) ▷gradient ofS(mini-batch) 4: zt ∼ N(₀, ϱ·_η_t) ▷ √ϱ·_η_t ≫_η_t 5: S(t+1)←S(t)−₄ε_B · ηt 2(L¯g(S (t)_, Φ) +_λS(t)_◦_H−1_{) +} zt 6: η_t₊₁= η1 tγ 7: returnS(t+1)

Now, a natural question is how to determine the log-likelihood boundB? ( max

u∈R>0 ∑

i∈Ru(ˆrui −rui)

2 ≤ _{B, and see Equation (3.9)). Obviously,}_B_{depends on}

randomly remove some ratings thus to ensure that each user at most hasτratings. In our context, the rating scale is [1,5], letτ =200, we haveB = (5−1)2×₂₀₀

(In reality, most users have less than 200 ratings [73]).

Theorem 4. Algorithm 3 provides(ε,(1+eε₎_δ₎_{-differential privacy guarantee to any} user if the distribution P′_X where the approximate samples from is δ-far away from the true posterior distribution PX, formally||P′X −PX||1≤δ. And δ →0if the MCMC

sampling asymptotically converges.

Proof. Essentially, differential privacy via posterior sampling [124] is an exponen- tial mechanism [77] which protectsε-differential privacy when releasing a sample θwith probability proportional toexp(− ε

2ΔFp(X |θ)), wherep(X |θ)serves as the

utility function. Ifp(X |θ)_{is bounded to}_{B, we have the sensitivity}_ΔF ≤ ₂_B.

Thus, release a sample by Algorithm 3 preservesε-differential privacy. It compromises the privacy guarantee to(ε,(1+eε₎_δ₎_{if the distribution (where the sam-} ple from) isδ-far away from the true posterior distribution, proved by Wang et al. [124].

Note that whenε=4B, the differentially private sampling process is identical to the non-private sampling. This is also the meaning ofsome extent of free privacy. It starts to lose accuracy whenε<4B. One concern of this sampling approach is the distanceδbetween the distribution where the samples from and the true posterior distribution, which compromises the differential privacy guarantee. Fortunately, an emerging line of works, such as [105,121], proved that SGLD can converge in finite iterations. As such we can have arbitrarily smallδwith a (large) number of iterations.

In document Privacy-preserving Recommender Systems Facilitated By The Machine Learning Approach (Page 50-54)