Comparison to Prior Work - Private Matrix Completion

3.4 Private Matrix Completion

3.4.3 Comparison to Prior Work

As discussed earlier, our results are the first to provide non-trivial error bounds for DP matrix completion. For comparing different results, we consider the following setting of the hidden matrixY∗ ∈_Rm×n_{and the set of released entries}_Ω_{: i)}

|Ω| ≈ m√n, ii) each row ofY∗ has anL2 norm of

√

n, and iii) each row ofPΩ(Y∗)

has L2-norm at most n1/4, i.e., ≈

√

n random entries are revealed for each row. Furthermore, we assume the spectral norm of Y∗ is at most O(√mn), and Y∗ is

rank-one. These conditions are satisfied by a matrixY∗ =u·vT_{, where}√_n_random

entries are observedper user, andui, vj ∈[−1,1]∀i, j.

Algorithm Bound onm Bound on|Ω|

Nuclear norm min. (non-private) ω(n) ω(m√n)

(Shalev-Shwartz et al.(2011))

Noisy SVD + kNN – –

(McSherry & Mironov(2009))

Noisy SGLD (Liu et al.(2015)) – –

Private FW (Jain et al.(2018)) ω(n5/4) ω(m√n) Table 3.1: Sample complexity bounds for matrix completion. m =

no. of users,n=no. of items. The bounds hide privacy parameters andlog(1/δ), and polylog factors inm,n.

Algorithm Error

Randomized response (Blum et al.(2005)) O(√m+n)

(Chan et al.(2011);Dwork et al.(2014b))

Gaussian measurement (Hardt & Roth(2012)) O √m+pµn_m

Noisy power method (Hardt & Roth(2013)) O(√µ)

Exponential mechanism (Kapralov & Talwar(2013)) O(m+n)

Private FW (Jain et al.(2018)) O m3/10n1/10

Private SVD (Jain et al.(2018)) O

µ n_m2 + m_n

Table 3.2: Error bounds (kY −Y∗kF) for low-rank approximation.

µ ∈ [0, m]is the incoherence parameter (Definition 29). The bounds hide privacy parametersand log(1/δ), and polylog factors inm an n. Rank of the output matrix Ypriv is O m2/5/n1/5

for Private FW, whereas it isO(1)for the others.

In Table 3.1, we provide a comparison based on the sample complexity, i.e., the number of users mand the number observed samples|Ω| needed to attain a gen- eralization error of o(1). We compare our results with the best non-private algorithm for matrix completion based on nuclear norm minimization (Shalev-Shwartz et al.(2011)), and the prior work on DP matrix completion (McSherry & Mironov

(2009); Liu et al.(2015)). We see that for the same |Ω|, the sample complexity on m increases fromω(n)toω(n5/4₎_{for our FW-based algorithm. While}_{McSherry &} Mironov(2009);Liu et al.(2015) work under the notion of Joint DP as well, they do not provide any formal accuracy guarantees.

Interlude: Low-rank approximation. We also compare our results with the prior work on a related problem of DP low-rank approximation. Given a matrix Y∗ ∈

Rm×n, the goal is to compute a DP low-rank approximationYpriv, s.t. Ypriv is close

to Y∗ either in the spectral or Frobenius norm. Notice that this is similar to matrix completion if the set of revealed entriesΩis the complete matrix. Hence, our methods can be applied directly. To be consistent with the existing literature, we assume thatY∗ is rank-one matrix, and each row ofY∗ hasL2-norm at most one.

Table 3.2 compares the various results. While all the prior works provide trivial error bounds (in both Frobenius and spectral norm, askY∗k2 =kY∗k_F ≤

√

m), our methods provide non-trivial bounds. The key difference is that we ensure Joint DP (Definition 18), while existing methods ensure the stricter standard DP (Defini- tion 2.1.3), with the exponential mechanism (Kapralov & Talwar(2013)) ensuring

CHAPTER 4

Private Convex Optimization

In this chapter, we will look at a technique for practical differentially private convex optimization. We will also look at an extensive empirical evaluation, which includes many high-dimensional publicly available benchmark datasets, corrobo- rating that this technique performs well in practice.

4.1 ADDITIONAL PRELIMINARIES

Given anm-element datasetD ={d1, d2, . . . , dm}, s.t.di ∼ Dfori∈ [m], the objec-

tive is to get a modelθˆfrom the following unconstrained optimization problem:

ˆ θ∈arg min θ∈Rn L(θ;D), whereL(θ;D) = _m1 m P i=1

`(θ;di)is the empirical risk,n > 0, and`(θ;di)is defined as

a loss function fordi that is convex in the first parameterθ ∈Rn. This formulation

falls under the framework of ERM, which is useful in various settings, including the widely applicable problem of classification in machine learning via linear regression, logistic regression, or support vector machines. The notationkxkis used to represent theL2-norm of a vectorx. Next, we define certain basic properties of

functions that will be helpful in further sections.

Definition 4.1.1. A functionf :_Rn →_R:

• is aconvex functionif for allθ1, θ2 ∈Rn, we have

• is aξ-strongly convex functionif for allθ1, θ2 ∈Rn, we have f(θ1)≥f(θ2) +h∇f(θ2), θ1−θ2i+ ξ 2kθ1−θ2k 2 or equivalently,h∇f(θ1)− ∇f(θ2),(θ1−θ2)i ≥ξkθ1−θ2k2

• hasLq-Lipschitz constant∆if for allθ1, θ2 ∈Rn, we have

|f(θ1)−f(θ2)| ≤∆· kθ1−θ2kq

• isβ-smoothif for allθ1, θ2 ∈Rn, we have

k∇f(θ1)− ∇f(θ2)k ≤β· kθ1−θ2k

Lastly, we define Generalized Linear Models (GLMs).

Definition 4.1.2(Generalized Linear Model). For a model spaceθ∈_Rn_{, where}_{n >}₀_,

the sample space U in aGeneralized Linear Model (GLM) is defined as the cartesian product of a n-dimensional feature spaceX ⊆ _Rn_{and a label space} _Y_{, i.e.,} _U ₌ _{X × Y}_.

Thus, each data sampledi ∈U can be decomposed into a feature vectorxi ∈ X, and a label

yi ∈ Y. Moreover, the loss function`(θ;di)for a GLM is a function ofxTi θandyi.

In this chapter, we will use the notion of neighboring under modification (Def- inition 2.1.1) for the guarantee of DP (Definition 2.1.3).

In document Advances in privacy-preserving machine learning (Page 52-56)