http://www.scirp.org/journal/jamp ISSN Online: 2327-4379
ISSN Print: 2327-4352
Error Analysis and Variable Selection for
Differential Private Learning Algorithm
Weilin Nie
1, Cheng Wang
2Huizhou University, Huizhou, China
Abstract
In this paper, we construct a modified least squares regression algorithm which can provide privacy protection. A new concentration inequality is ap-plied and the expected error bound is derived by error decomposition. Fur-thermore, via the error analysis, we find a method to choose an appropriate parameter to balance the error and privacy.
Keywords
Differential Privacy, Least Squares Regularization, Concentration Inequality, Error Decomposition
1. Introduction
Privacy protection attracts much attention in many branches of computer sci- ence. To deal with this, Dwork et al. proposed differential privacy in [1]. Soon [2]
builds an exponential mechanism which is a useful approach to construct a dif-ferential private algorithm. The concept is introduced into learning theory in [3]. There, the authors consider output perturbation and object perturbation for ERM algorithms. Analysis of privacy and generalization for those algorithms al-so has been conducted. P. Jain and his collaborators have done a lot of work on differential private learning afterwards [4] [5] and etc. Recently, in [6], the au-thors find that the empirical average of the output from a differential private al-gorithm can converge to its expectation. And [7] provides another analysis of this convergence, which motivates our work.
In this paper, we consider the following statistical learning model (see [8] [9]
for more details): The input space X is a compact metric space, and the output
space is Y⊂ as a regression problem. Throughout the paper, we assume the
output Y is uniformly bounded, i.e., y ≤M for some M >0 almost surely.
On the sample space Z:=X×Y, we try to find a function f :X →Y via some
How to cite this paper: Nie, W.L. and Wang, C. (2017) Error Analysis and Variable Selection for Differential Private Learning Algorithm. Journal of Applied Mathematics and Physics, 5, 900-911.
https://doi.org/10.4236/jamp.2017.54079
Received: February 14, 2017 Accepted: April 27, 2017 Published: April 30, 2017
Copyright © 2017 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/
algorithms , reflecting the relationship between the input and output.
Algo-rithm relies on the random chosen sample
{ }
i m1{
(
i,
i)
}
m1i i
z
=x y
=
=
=
z
, whilethe sample is drawn according to a distribution function ρ on Z. Furthermore,
we assume there is a marginal distribution ρX on X and conditional
distri-bution ρ
( )
y x on Y given some x.Now we expect the algorithm can provide some privacy protection. We as-sume satisfies the
(
,γ)
differential private condition [1]. Denoting theHamming distance between two sample sets
{
z z1, 2}
is(
1, 2)
#{
1, , : 1,i 2,i}
,d z z = i= m z ≠z
i.e., there is only one element is different. Then
(
,γ)
-differential private isde-fined as follows:
Definition 1 A random algorithm : m
A Z → is
(
,γ)
-differential privateif for every two data sets z z1, 2 satisfying d
(
z z1, 2)
=1, and every sets ∈ we have( )
{
1}
{
( )
2}
Pr A z ∈ ≤ ⋅e Pr A z ∈ +γ.
Here is a function space from X to Y, which is called the hypothesis
space. In the sequel, we focus on the
(
, 0)
-differential privacy with some0< < 1, which is always called -differential privacy for simplicity. How to
choose an appropriate is a fundamental problem in differential private
algo-rithms [10], and we will provide a method during our error estimation in the following sections.
2. Concentration Inequality
In this section, we study the error between average and expectation for an algo-rithm providing -differential privacy. Our first result can be stated as
fol-low:
Theorem 1 If an algorithm provides -differential privacy, and outputs
a positive function
g
z,:
X Y
× →
with bounded expectation
z,g
z,≤
G
for some G>0, where the expectation is taken over the sample via thealgo-rithm output. Then
( )
( )
, , ,
1
1
d 2 ,
m
z z i Z z
i
g z g z G
m = ρ
− ≤
∑
∫
and
( )
( )
, , ,
1
1
d 2 .
m
z Z z z i
i
g z g z G
m
ρ
=
− ≤
∫
∑
Denote sample sets wj=
{
z z1, 2,,zj−1,z z′j, j+1,,zm}
for j∈{
1, 2,,m}
. We observe that( )
(
( )
)
( )
{
}
, , ,
1 1
, 0 1
1 1
1
d Pr
i
m m
z z i z z i
i i
m
z z z i
i
g z g z
m m
g z t t
m
= =
+∞ ′ =
=
= ≥
∑
∑
∑
∫
( )
{
}
( )
(
)
(
( )
)
( )
( )
( )
, 0 1 , , , 1 1 , , , , 1 1 , , 1e Pr d
1 1
e e
1 1
e d e d
e d .
i i
i i i i i i
i i
m
z z w i
i
m m
w z w i w z w i
i i
m m
w Z w z Z z
i i
z Z z
g z t t
m
g z g z
m m
g z g z
m m g z ρ ρ ρ +∞ ′ = = = = = ≤ ≥ = = = = =
∑
∫
∑
∑
∑
∫
∑
∫
∫
Then( )
( )
(
)
(
( )
)
, , , 1 , , 1 de 1 d 2 .
m
z z i Z z
i
z Z z
g z g z
m
g z G
ρ ρ = − ≤ − ≤
∑
∫
∫
On the other hand,
( )
( )
( )
( ) ( )
( )
(
)
{
( )
}
( )
{
}
( )
(
)
, , , 1 , , 1 1, 0 ,
1 1 , 0 1 , , 1 1 d d 1 1 d d 1 1 d Pr 1
e Pr d
1 1
e e
i i i i
i i i i i
i
m
z Z z z Z z
i
m m
w Z w w Z w i i
i i
m m
w z w i z z w i
i i
m
z z z i
i m
z z i z
i
g z g z
m
g z g z z
m m
g z g z t t
m m
g z t t
m g z m m ρ ρ ρ ρ = = = +∞ ′ = = +∞ ′ = = = = = = = ≥ ≤ ≥ = =
∑
∫
∫
∑
∫
∑
∫
∑
∑
∫
∑
∫
∑
gz,
( )
zi .This leads to
( )
( )
(
)
( )
, , , 1 , , 1 1 d 1e 1 2 .
m
z Z z z i
i m
z z j
i
g z g z
m
g z G
m
ρ
= = − = − ≤∑
∫
∑
These verify our results.
Remark 1 Similar results are proposed in[6]and[7]. However, there the au-thors limits the function to take value in
[ ]
0,1 or{ }
0,1 , our result hereex-tends theirs to the function taking value in +. This makes our following error analysis implementable.
3. Differential Private Learning Algorithm
In this section we consider the differential private least squares regularization algorithm. For a Mercer kernel K defined on X×X , the function space
( )
{
}
: span
, ,
K
=
K x
⋅
x
∈
X
is the induced reproducing kernel Hilbert space(RKHS). Denote Kx
( )
y =K x y(
,)
for any x y, ∈X , and( )
,
sup
x y XK x y
,
κ
=
∈ . It is well known that f( )
x = f K, x K as thereproducing property. In the sequel, we always assume y ≤M for some
con-stant M >0. The least squares regularization algorithm, which has been
( )
(
)
2 2,
1
1
arg min .
K
m
z f i i K
i
f f x y f
m
λ ∈ λ
=
=
∑
− + (1)
Denote π as a projection operator as we did in [13] [14]:
( )
(
)
( )
( )
( )
( )
,
, .
,
M f x M
f x f x M f x M
M f x M
π
>
= − ≤ ≤
− < −
Then we add a noise term b in the original algorithm (1) like the output
perturbation algorithm in [3]:
( )
(
( )
)
, ,
z z
f x =π f λ x +b (2)
where the density of b is independent with z which will be clarified in the
following analysis. Moreover, we take the following notation for simplicity:
( )
(
( )
)
2( )
(
( )
)
21
1
d , .
m
z i i
Z
i
f f x y f f x y
m
ρ
=
=
∫
− =∑
−
Definition 2 We denote ∆fz as the maximum infinite norm of difference when changing one sample point in z, i.e., if d
(
z z, ′ =)
1,,
sup .
z z z
z z
f f f′ ∞
′
∆ = −
Then we have the following result:
Lemma 1 Assume ∆π
(
fz,λ( )
x)
is bounded, and b has density functionproportion to
( )
, expz b
f λ
π
−
∆
, then algorithm (2) provides -differential
privacy.
The proof is just as Theorem 4 in [15]. For all possible function r, and z z, ′ differ in one element, then
{
}
{
( )
}
( )
( )
,, ,
,
Pr z Pr z exp z ,
b
z
r f
f r b r f
f
λ λ
λ
π π
π ∞
−
= = = − ∝ −
∆
and
{
}
{
( )
}
( )
( )
,, ,
,
Pr z Pr z exp z .
b
z
r f
f r b r f
f
λ λ
λ
π π
π
′ ∞
′ ′
′
−
= = = − ∝ −
∆
So
{
}
{
(
)
}
( , ) (( , ), ){
}
, , ,
Pr Pr e e Pr .
z z
z
f f
f
z z z
f r f r f r
λ λ
λ
π π
π
π
′
− ∞
∆
′ ′
= ≤ = × ≤ =
Then the lemma is proved by a union bound. Now we will bound the term
∆
f
z,λ.Lemma 2 For the function
f
z,λ obtained from algorithm (1), assume,
z K
f λ ≤R for any z∈Zm for some R≥M, and 0< ≤λ 1, we have
(
)
2
,
2 1
.
z
R f
m
λ
κ κ λ
+
Assume
f
z,λ andf
z′,λ are two results derived via algorithm (1) given anysample set z z, ′ satisfying d
(
z z, ′ =)
1. Without loss of generality, we set(
z z1, 2, ,zm−1,zm′)
′ =
z . Since the two functions are both the optimizer of
algo-rithm (1), take derivative for f we have
( )
(
,)
,1
2
2 0
i
m
z i i x z
i
f x y K f
m = λ λ λ
− + =
∑
and
( )
(
)
(
( )
)
1
, , ,
1
2 2
2 0.
i m
m
z i i x z m m x z
i
f x y K f x y K f
m λ m λ λ λ
−
′ ′ ′
=
′ ′
− + − + =
∑
These lead to
( )
( )
(
)
(
)
( )
(
)
(
( )
)
, , , ,
1
, ,
1
1
.
i
m m
m
z i z i x z z
i
z m m x z m m x
f x f x K f f
m
f x y K f x y K
m
λ λ λ λ
λ λ
λ
′ ′
=
′ ′
− + −
′ ′
= − − −
∑
Take inner product with
f
z,λ−
f
z′,λ by both sides we have( )
( )
(
)
( )
(
)
(
( )
( )
)
( )
(
)
(
( )
( )
)
2 2
, , , ,
1
, , ,
, , ,
1
1
.
m
z i z i z z K
i
z m m z m z m
z m m z m z m
f x f x f f
m
f x y f x f x
m
f x y f x f x
λ λ λ λ
λ λ λ
λ λ λ
λ
′ ′
=
′ ′
′
− + −
′ ′ ′ ′
= − −
− − −
∑
This means
( )
( )
(
)
2
, , , , , ,
, , , ,
1
1
2 .
z z K z m m z m m z z
z z z z K
f f f x y f x y f f
m
f f M f f
m
λ λ λ λ λ λ
λ λ λ λ
λ
κ
′ ′ ′ ∞
′ ∞ ∞ ′
′ ′
− ≤ − + − ⋅ −
≤ + + −
The last inequality is from the fact that
( )
sup sup , x K x K K K.
x X x X
f ∞ f x f K K f
κ
f∈ ∈
= = ≤ ⋅ ≤
Since fz,λ K ≤R, then fz′,λ K ≤R as well. Therefore,
(
)
(
)
, ,
2 1
1
2 2
z z K
R
f f R M
m m
λ λ
κ κ
κ κ
λ λ
′
+
− ≤ + ≤
for any 0< ≤λ 1. So
(
)
2
, ,
2 1
z z
R f f
m
λ λ
κ κ λ ′ ∞
+
− ≤
for any z z, ′, and our lemma holds. It can be easily verified by discussion that
( ) ( )
f
z,λf
z,λf
z,λf
z,λπ
−
π
′ ∞≤
−
′ ∞for any z z, ′, so we have the choice of noise b and the result for algorithm (2).
Proposition 1 Assume fz,λ K≤R for any
m Z
∈
(
)
2
1 exp
2 1
m b R
λ
α κ κ
−
+
, where 2
(
)
4R 1
m
κ κ α
λ + =
, then the algorithm (2) pro-
vides -differential privacy.
The proof is by combining the two lemmas and the inequality above. And by simply calculation we can get the expression of α.
4. Error Analysis for Differential Private Learning Algorithm
In this section, we will study the expectation of the error between
( )
fz, −( )
fρ , where d
( )
Y
fρ =
∫
y ρ y x is the regression function whichmi-nimizes
( )
f . Firstly we shall introduce the error decomposition:( )
( )
( )
( )
( )
( )
( )
(
( )
)
( )
(
)
( )
( )
( )
( )
(
( )
)
( )
( )
( )
( )
( )
(
( )
)
( )
( )
( )
2
, , ,
, , , ,
2
, ,
, , , ,
2
, ,
, , , ,
2
1 2 ,
z z z K
z z z z z z z
z z z K
z z z z z z z
z z z K
z z z z z z z
z K
f f f f f
f f f f
f f f
f f f f
f f f
f f f f
f f f
D
ρ ρ λ
λ
λ λ ρ
λ
λ λ ρ
λ
λ λ ρ
λ π
π λ
π
λ
π λ
λ
− ≤ − +
≤ − + −
+ + −
≤ − + −
+ + −
≤ − + −
+ + −
≤ + + +
(3)
where fλ is a function in K to be determined and
( )
( )
1 = fz, −z fz, ,
( )
(
( )
)
2
=
zf
z,−
zπ
f
z,λ,
( )
( )
,z fλ fλ
= −
( ) ( )
( )
2.
K
D
λ
=
f
λ−
f
ρ+
λ
f
λHere 1 and 2 involve the function
f
z, from random algorithm (2) sowe call them random errors. and D
( )
λ are similar as classical ones in thepast literature in learning theory and we still call them sample error and ap-proximation error. In the following, we will study these errors respectively.
4.1. Error Bounds for Random Errors
Proposition 2 For function
f
z, obtained from algorithm (2) with density of b as described in Proposition 1, we have(
)
2 2 42
, 1 2 2 2
2 1
8 .
z
R
M m
κ κ
λ
+
≤ +
Note that
( )
(
)
2(
( )
)
21 , ,
1
1
d ,
m
z z i i
Z
i
f x y f x y
m
ρ =
=
∫
− −∑
−
( )
(
)
(
( )
)
(
)
(
(
( )
)
)
( )
(
)
(
)
(
(
( )
)
)
(
)
(
)
2 2
, , ,
1
2
, 1
2 2
, ,
2 2 4
2 2 2 2
1
d |
1
e 1 d
2
8 1
2 4 ,
m
z Z z z i i
i m
z z i i
i
z b z i i z i i
f x y f x y
m
f x b y
m
b b f x y f x y
R
M m
λ
λ λ
ρ
π ρ
π π
κ κ λ
=
=
− − −
≤ − + −
= + − + −
+
≤ +
∑
∫
∑
which verifies the proposition.
For the term 2, we have the same analysis.
Proposition 3 For function
f
z, obtained from algorithm (2) with density ofb as described in Proposition 1, we have
(
)
2 2 4 , 2 2 2 28 1
.
z
R m
κ κ
λ
+ ≤
Since
( )
(
( )
)
( )
(
)
(
(
( )
)
)
( )
(
)
(
)
( )
(
)
(
)
2 , ,
2 2
, ,
1
, 1
2
, 1
1
1
2 2
1
2 ,
z z z z
m
z i i z i i
i m
z i i
i
m
z i i
i
f f
f x y f x y
m
b b f x y
m
b b f x y
m
λ
λ
λ
λ
π
π
π
π
=
=
=
= −
= − − −
= + −
= + ⋅ −
∑
∑
∑
we have
(
)
2 2 4 2, 2 2 2 2
8 1
.
z z b
R b
m
κ κ
λ
+
= ≤
And the proposition is proved.
4.2. Error Estimates for Sample Error and Approximation Error
Error estimates for sample error and approximation error have been extensively studied since [8]. Here we provide the proof for completeness. It is known that
fλ in the error decomposition (3) can be arbitrarily chosen in K in [12] [13]
[14] and etc. Here we simply choose it to be the classical one
( )
2arg min .
K K
f
fλ = ∈ f +λ f
From [16] [17] we have the expression of fλ is
(
)
1,
K K
fλ = L +λ − L fρ
where LK is the operator defined on
2
X
Lρ as
( )
( ) ( )
, d .K X X
L f t =
∫
f x K x t ρ[8] told us that LK has a eigenvalue sequence
{ }
µ
i i≥1 satisfies0 0
i i
µ > µ → when i→ ∞, and LK ≤
κ
2. Now we recall the HoeffdingLemma 3 Let ξ be a random variable on a probability space Z satisfying
( )
z Bξ −ξ ≤ for some B>0 for almost all z∈Z, then
( )
221
1
Pr 2 exp .
2 m i i m z m B ε
ξ ξ ε
=
− ≥ ≤ −
∑
Then we have the following analysis.
Proposition 4 For fλ and
f
ρ defined as above, assume( )
2X r Kf
ρ∈
L
L
ρ , we have( )
2 min 2 ,1{ }(
4 2 4 4)
2,
8 2π
2 .
r r r r
z K
M
D L f
m ρ ρ
λ λ κ − κ − −
+ ≤ + + +
Firstly we bound the sample error.
( )
( )
( )
(
)
2(
( )
)
21 1 d . z m i i Z i f f
f x y f x y
m
λ λ
λ
ρ
λ=
= −
=
∫
− −∑
−
Let
ξ
( )
z
= −
(
f
λ( )
x
−
y
)
2, sincef
ρ( )
x
=
∫
Yy
d
ρ
( )
y x
≤
M
, and(
)
(
)
1 1 , K K K Kf L I L f
L I L f M
λ ρ ρ λ λ − ∞ ∞ − ∞ = + ≤ + ⋅ ≤
we have 2
8M
ξ
−ξ
≤ . So from Hoeffding inequality there holds( )
(
)
2(
( )
)
2 1 2 4 1 d Pr2 exp .
128
m
i i
Z
z i
f x y f x y
m
m
M
λ ρ λ ε
ε = − − − ≥ ≤ −
∑
∫
Then we have
{
}
, 0 2 2 4 0 d Pr 8 2π2 exp d .
128 z z z t t mt M t M m +∞ +∞ ≤ = ≥ = − ≤
∫
∫
For the approximation error, note that
( )
( )
2fλ − fρ = fλ− fρ ρ
[9]
which is independent with z and b, we have
( )
( )
(
)
(
(
)
)
(
)
(
)
( ) { }(
( ))
2 , 2 1 2 1 2 2 2 1 2 2 2 2 4 1 2 2 min 2 ,2 4 1, 1
, 1
1 .
z
K K K
r r
K K K
r r
K K K
r r K
r r
K
r r r
K
f f f f
L I L L I f
L I L L f
L I L L f
L f r
L f r
L f
λ ρ λ ρ ρ
On the other hand, in [8], the authors pointed out that 12
K K
f L f
ρ −
= for
any f∈K. So
(
)
(
)
(
)
{ }(
)
2 2 1 , 2 1 1 2 2 1 2 1 2 2 2 2 4 2 2 min 2 ,1 4 21 , 2 1 , 2 1 .
z K K K K
K K
r r
K K K
r r K
r r
K
r r r
K
f L I L f
L I L f
L I L L f
L f r
L f r
L f λ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ
λ λ λ
λ λ λ λ λ λ κ λ κ − − + − − − − − − − = + = + ≤ + ⋅ ≤ ≤
⋅ >
≤ +
Combining the 3 bounds above, we can verify the proposition.
4.3. Convergence Result with Fixed
In our analysis for
z,
1 above, we indeed have the following result(
)
2(
( )
)
2 4
, 1 2 2 ,
16 1
2 .
z z z z
R f m λ
κ κ
ε π
λ
+ ≤ + Therefore, the error decomposition can be
( )
(
)
( )
(
)
( )
( )
(
)
(
)
(
)
(
(
( )
)
( )
)
(
( )
)
(
)
(
( )
( )
)
(
( )
)
(
)
(
( )
( )
)
(
( )
)
(
)
, ,, 1 2
2 2
2 4 2 4
,
2 2 2 2 2
2 2 4
2
, ,
2 2 2 2 2 4
2 2 2 2
2 2 4
2 2 2 1 2
2
16 1 8 1
2 24 1 2 24 1 2 24 1 z z z
z z z z
z z z z K z
z z K z
f f
D f
R R
f f D
m m
R
f f f D
m R
f f f D
m R m ρ ρ λ ρ
λ λ ρ
λ λ ρ
λ
κ κ κ κ
π λ
λ λ
κ κ
λ λ
λ
κ κ λ λ
λ κ κ λ − + = + + + − + + ≤ + + − + + + ≤ + + − + + + ≤ + + − + + + ≤
(
)
(
( )
)
(
)
2(
)
{ }(
)
2 4 2
2 min 1,2 4 2 4 4
3 2 2
1 2
24 1 3 2π 1 2
2 .
z
r r r r
K D
M M
L f
m m ρ ρ
λ
κ κ λ κ κ
λ − − −
+ + + + + ≤ + + + +
Then by choosing 1 2 min 4,3 2{ r}
m
λ= +
for balance we have the following result.
Theorem 2 Let
f
z, derived from algorithm (2),f
z,λ, fλ defined in the above subsections, and assume( )
X2r K
fρ∈L Lρ , take
{ }
2 min 4,3 2
1 r
m
λ= +
,
( )
(
)
( )
(
)
min 1, 4 2 3 2, ,
1
1 2 ,
r r
z fz f C
m
ρ
+
− + ≤
where constant
(
)
2(
)
(
)
2 4
2
2 4 2 4 4
2
24 1
8 2π 1 2 r r 2 Kr
M
C=
κ κ
+ + M + +κ
− +κ
− + L f− ρ ρ .
4.4. Selection of and
Total Error Bound
From the analysis for random error, sample error and approximation error above, we can obtain the whole error bound as follow.
Theorem 3 Let
f
z, derived from algorithm (2),f
z,λ, fλ defined in the above subsections, and assume( )
X2r K
fρ∈L Lρ , take
{ }
2 min 4,3 2
1
,
r
m
λ= +
and
( )
{ }
min 1 3,4 3 6
1 r r
m
+
=
we have
( )
( )
(
)
min 1 4, 3 3 6, ,
1
,
r r
z fz f C
m
ρ
+
− ≤
where constant
(
)
(
)
(
)
2
2 2 4
2 4 2 4 4
8 1 2π 24 1
2 .
r r r
K
C M M
L fρ ρ
κ κ
κ
−κ
− −= + + +
+ + +
It can be seen from error decomposition (3) that
( )
( )
(
)
(
( )
( )
)
( )
(
)
(
)
(
)
{ }
(
)
(
)
{ }
(
)
2
, , , , ,
, 1 2
2 2
2 4 2 4
2
2 2 2 2 2 2
2
2 min 2 ,1 4 2 4 4
2
2 4 2
2
2 2 2
2 min 2 ,1 4 2 4 4
2 1 8 1
8
8 2π
2
24 1 8 2π
8
2 .
z z z z z K
z
r r r r
K
r r r r
K
f f f f f
D
R R
M
m m
M
L f m
R M
M
m m
L f
ρ ρ λ
ρ ρ
ρ ρ
λ λ
κ κ κ κ
λ λ
λ κ κ
κ κ λ
λ κ κ
− − −
− − −
− ≤ − +
≤ + + +
+ +
≤ + +
+ + + +
+
≤ + +
+ + +
Since 2
( )
2( )
2, , ,
0
z K z z z K z
f
λf
λf
λM
λ
≤
+
λ
≤
≤
, we have z, KM f λ
λ ≤ , i.e.,
we can choose M
R
λ
= . Now take 1 2 min 4,3 2{ r}
m
( )
{ }
min 1 3,4 3 6
1 r r
m
+
=
for balance, and the result is proved.
5. Conclusions
Theorem 2, where is taken as a constant, reveals that the generalization error
( )
(
π
f
z,)
converges not to the one of regression function ( )
fρ , but a little different one(
1 2+ )
( )
fρ in expectation.It can be seen from the definition of differential privacy that algorithms will provide more privacy when tends to 0. However, Theorem 3 shows that
cannot be too small, since the expected error will be very large accordingly. Hence our choice can be regarded as a balance between privacy protection and the expected error. In [19], the authors announce that also needs tend to 0 in
some rates to keep generalization which matches our result.
Compared with previous learning theory results [12] [20] [21] [22] and etc., our learning rate is not so good since a perturbation term is introduced. Howev-er, in our result Theorem 1, we did not need a capacity condition as what we did in classical error analysis, i.e., conditions on covering numbers, VC or Vγ di-mensions. Instead the -differential private condition is adopted. So it may be
capable and interesting for us to apply such condition to other learning algo-rithms.
Acknowledgements
This work is supported by NSFC (Nos. 11326096, 11401247), NSF of Guangdong Province in China (No. 2015A030313674), National Social Science Fund in Chi-na (No. 15BTJ024), Planning Fund Project of Humanities and Social Science Research in Chinese Ministry of Education (No. 14YJAZH040), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (No. 2016KQNCX162) and the Major Incubation Research Project of Huizhou Uni-versity (No. hzux1201619).
References
[1] Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006) Calibrating Noise to Sen-sitivity in Private Data Analysis. In: Halevi, S. and Rabin, T., Eds., Theory of Cryp-tography, Springer, Berlin, 265-284.
[2] McSherry, F. and Talwar, K. (2007) Mechanism Design via Differential Privacy.
Proceedings of the 48th Annual Symposium on Foundations of Computer Science, Providence, 21-23 October 2007, 94-103.https://doi.org/10.1109/focs.2007.66
[3] Chaudhuri, K., Monteleoni, C. and Sarwate, A.D. (2011) Differentially Private
Em-pirical Risk Minimization. Journal of Machine Learning Research, 12, 1069-1109. [4] Jain, P. and Thakurta, A.G. (2013) Differentially Private Learning with Kernels.
JMLR: Workshop and Conference Proceedings, 28, 118-126.
[5] Jain, P. and Thakurta, A.G. (2014) Dimension Independent Risk Bounds for
Diffe-rentially Private Learning. Proceedings of the 31st International Conference on Ma- chine Learning, Beijing, 21-26 June 2014, 476-484.
Preserving Statistical Validity in Adaptive Data Analysis. ACM Symposium on the Theory of Computing, Portland, 14-17 June 2015, 117-126.
https://doi.org/10.1145/2746539.2746580
[7] Bassily, R., Nissim, K., Smith, A., Steinke, T., Stemmer, U. and Ullman, J. (2015) Algorithmic Stability for Adaptive Data Analysis.
[8] Cucker, F. and Smale, S. (2002) On the Mathematical Foundations of Learning. Bu- lletin of the AMS, 39, 1-49.https://doi.org/10.1090/S0273-0979-01-00923-5
[9] Cucker, F. and Zhou, D.X. (2007) Learning Theory: An Approximation Theory
Viewpoint. Cambridge University Press, Cambridge.
https://doi.org/10.1017/CBO9780511618796
[10] Dwork, C. (2008) Differential Privacy: A Survey of Results. International Confe-rence on Theory and Applications of Models of Computation, Xi’an, 25-29 April 2008, 1-19.
[11] Steinwart, I., Hush, D. and Scovel, C. (2009) Optimal Rates for Regularized Least Squares Regression. In: Dasgupta, S. and Klivans, A., Eds., Proceedings of the 22nd Annual Conference on Learning Theory, Montreal, 18-21 June 2009, 79-93. [12] Wu, Q., Ying, Y. and Zhou, D.X. (2006) Learning Rates of Least-Square Regularized
Regression. Foundations of Computational Mathematics, 6, 171-192.
https://doi.org/10.1007/s10208-004-0155-9
[13] Nie, W.L. and Wang, C. (2015) Constructive Analysis for Coefficient Regularization Regression Algorithms. Journal of Mathematical Analysis and Applications, 431, 1153-1171.
[14] Wang, C. and Nie, W.L. (2014) Constructive Analysis for Least Squares Regression with Generalized K-Norm Regularization. Abstract and Applied Analysis, 2014, Ar-ticle ID: 458459.https://doi.org/10.1155/2014/458459
[15] Dwork, C. (2006) Differential Privacy. Springer, Berlin, 1-12.
[16] Smale, S. and Zhou, D.X. (2003) Estimating the Approximation Error in Learning
Theory. Analysis and Applications, 1, 17-41.
https://doi.org/10.1142/S0219530503000089
[17] Smale, S. and Zhou, D.X. (2007) Learning Theory Estimates via Integral Operators and Their Applications. Constructive Approximation, 26, 153-172.
https://doi.org/10.1007/s00365-006-0659-y
[18] Hoeffding, W. (1963) Probability Inequalities for Sums of Bounded Random
Va-riables. Journal of the American Statistical Association, 58, 13-30.
https://doi.org/10.1080/01621459.1963.10500830
[19] Wang, Y.-X., Lei, J. and Fienberg, S.E. (2015) Learning with Differential Privacy: Stability, Learn Ability and the Sufficiency and Necessity of ERM Principle.
[20] Wang, C. and Zhou, D.X. (2011) Optimal Learning Rates for Least Squares
Regula-rized Regression with Unbounded Sampling. Journal of Complexity, 27, 55-67.
[21] Hu, T., Fan, J., Wu, Q. and Zhou, D.X. (2015) Regularization Schemes for
Mini-mum Error Entropy Principle. Analysis and Applications, 13, 437-455.
https://doi.org/10.1142/S0219530514500110
[22] Christmann, A. and Zhou, D.X. (2016) Learning Rates for the Risk of Kernel-Based Quantile Regression Estimators in Additive Models. Analysis and Applications, 14,
Submit or recommend next manuscript to SCIRP and we will provide best service for you:
Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc. A wide selection of journals (inclusive of 9 subjects, more than 200 journals)
Providing 24-hour high-quality service User-friendly online submission system Fair and swift peer-review system
Efficient typesetting and proofreading procedure
Display of the result of downloads and visits, as well as the number of cited articles Maximum dissemination of your research work