Algorithms - Structural Inference under Differential Privacy

4.3 Structural Inference under Differential Privacy

4.3.2 Algorithms

We now formally describe our solution (referred to as HRG in the sequel). Our so-

lution is composed of three steps: (1) differentially privately sample a good dendro- gramT_{sam p l e}from the entire dendrogram space (Algorithm 4.1); (2) given the sampled dendrogram T_{sam p l e}, compute the probabilities { pr} associated with Tsam p l e (Algo-

rithm 4.2); (3) generate the sanitized graph according to the identified HRG (Algo- rithm 4.3). We divide the total privacy parameterε into 2 portions, ε₁ andε₂, each being used in one of the first two steps. Note that the third step does not require any privacy parameter.

Differentially Private Dendrogram Fitting. Since, for an input graph G with n vertices, each of its dendrogramsT is associated with a log-likelihood logL (T ), which

measures its goodness of representing G, a straightforward attempt to achieve dif-

ferential privacy is to employ the exponential mechanism. Let the utility function

be u(T ) = logL (T ). The exponential mechanism samples T with probability pro-

portional to exp(2ε1∆u·u(T ))

T 0∈T

exp(2ε1∆u·u(T0)), where T is the entire output space (i.e., the set of all

possible dendrograms ofG). Unfortunately, this simple idea is computationally in-

feasible because it requires to enumerate a total of|T | = (2n − 3)!! ≈p2(2n)n−1_e−n

possible dendrograms. In our solution, we overcome the issue by designing an MCMC process, which simulates the exponential mechanism by a sequence oflocal transitions

r

s

t

u

s

t

u

s

u

t

Figure 4-2: Three configurations ofr ’s subtrees[CMN08]

inT . Our differentially private dendrogram fitting algorithm is summarized in Algo- rithm 4.1.

Algorithm 4.1 is based on the Metropolis algorithm[BGJ+11]. It starts by choos- ing an arbitrary dendrogramT₀∈ T as the initial state of the Markov chain (Line 1). It then iteratively performs the following procedure (Lines 2-6): randomly propose

a neighboring dendrogram T0 _{of the dendrogram} _T

i−1 in the previous iteration and

update the current state in the following way:

T_i=  



T0 _{with probability}_α

T_i−1 with probability 1− α

where the acceptance ratioα = min(1, exp(2ε1∆u·log L (T0))

exp(2ε1∆u·log L (Ti−1))) and ∆u is the global sensitivity

of the utility functionu. We show how to calculate∆u in Section 4.4.2.

To draw a neighbor T0 _of _T

i−1 uniformly at random, we first randomly choose

an internal node r in T_i−1 (other than the root) and then permute the three subtrees associated with r to generate two alternative configurations of r ’s subtrees, as illus- trated in Figure 4-2. One of these two configurations is chosen to be the neighboring candidateT0_{. Let the state space of this Markov chain be}_{T . It is easy to verify that the}

transitions based on this permutation scheme are bothreversible and ergodic (i.e., any pair of dendrograms can be connected by a finite sequence of such transitions). Hence, such an MCMC procedure has a unique stationary distribution after it converges to equilibrium. We run the above Markov chain until equilibrium is reached, which in- dicates that the desired distribution has already been reached. Therefore, the sampled dendrogramT_{sam p l e}is indeed drawn from the stationary distribution (Line 8).

In practice, there are many approaches to diagnose MCMC convergence. Here we

Algorithm 4.2: CalculateNoisyProb(G, T_{sam p l e},r∗_,_ε 2)

Input : Input graph G, sampled dendrogram T_{sam p l e}, privacy parameterε₂, internal noder∗

Output: A vector of noisy probabilities{fp_r}, where r ∈ {r∗, all internal nodes below r∗_} 1 λ_b =_ε 1 2·(nLr ∗·nRr ∗); 2 λ_c =_ε 1 2·((nLr ∗+nRr ∗)(nLr ∗+nRr ∗−1)/2); 3 ifλ_b ≥ τ₁andλ_c ≥ τ₂then

4 e_c(r∗) ← number of edges in the subgraph induced by all leaf nodes of the

subtree rooted atr∗_;

5 ep= min{1,

e_c(r∗_)+Lap(1 ε2)

(nLr ∗+nRr ∗)(nLr ∗+nRr ∗−1)/2};

6 for each r in{r∗, all internal nodes below r∗} do 7 fp_r=ep; 8 end 9 else 10 ep_r∗ = min{1, e_{r ∗}+Lap(1 ε2) n_{Lr ∗}·nRr ∗ }; 11 r_L← r∗’s left child; 12 r_R← r∗’s right child; 13 CalculateNoisyProb(G, T_{sam p l e},r_L,ε₂); 14 CalculateNoisyProb(G, T_{sam p l e},r_R,ε₂); 15 end

the average log-likelihood to judge whether the Markov chain has converged to the stationary distribution. We will elaborate more details of MCMC convergence time in Section 4.5.2. Additional discussion about the convergence and its mixing time can

be found in[CMN07; CMN08].

Noisy Probability Calculation. In the second step, we calculate the noisy probabilities associated with T_{sam p l e}’s internal nodes. Recall that, for an internal node r , its associated probability p_r = er

n_Lr·nRr (see Section 4.2.1). It is easy to observe that

the probabilities of the internal nodes rooted in smaller subtrees (i.e., in lower levels of T_{sam p l e}) are generally more sensitive to Laplace noise injected. Indeed, according to our experiments, the direct application of the Laplace mechanism to these nodes’ probabilities results in poor utility. To relieve such negative effects, we propose a carefully designed thresholding strategy coupled with the Erd˝os-Rényi model, which is presented in Algorithm 4.2.

The general idea is that if a probability p_r cannot be “reliably” estimated by ap-

plying the Laplace mechanism to er

nLr·nRr, we employ the Erd˝os-Rényi model to ap-

proximate the probability. To measure the reliability of a noisy probability, we set up the sentinelλ_b. For an internal node r∗_in_T

sam p l e,λb is set to 1

ε2·(nLr ∗·nRr ∗) (Line 1),

which measures the noise scale of the potential noisy probabilityÝp_r∗. Ifλ_bis relatively

large with respect to a threshold valueτ₁ (that is, the probability cannot be reliably calculated by the Laplace mechanism), we model the subgraph induced by all leaf nodes of the subtree rooted atr∗_{as an Erd˝}_{os-Rényi random graph. With this model, the link}

probability ofany pair of vertices in this subgraph is ec(r∗)

(nLr ∗+nRr ∗)(nLr ∗+nRr ∗−1)/2, which is

later perturbed by the Laplace mechanism (Line 5). Otherwise, we can expect that

e_r

n_Lr·nRr still gives a good estimation after adding noise. Hence we directly generate the

noisy probability as min{1,er ∗+Lap(

1 ε2)

n_{Lr ∗}·nRr ∗ } (Line 10) and perform the similar procedure on

r∗_{’s children (Lines 11-14).}

In Algorithm 4.2, we calculate the noisy probabilities in a top-down manner over T_{sam p l e}. During this process, the approximated probabilities based on the Erd˝os-Rényi model also become less accurate due to added Laplace noise. Here, we would also like to guarantee the accuracy of the perturbed approximated probabilities. For this reason, we introduce another sentinelλ_c (Line 2), which is compared with a threshold valueτ₂ to indicate whether the noise scale of the approximated probabilities is

acceptable. In summary, we employ the Erd˝os-Rényi model when (1) the probability

cannot be accurately estimated by er ∗+Lap(

1 ε2)

nLr ∗·nRr ∗ (guarded byλb), and (2) injecting noise

to ec(r∗)

(nLr ∗+nRr ∗)(nLr ∗+nRr ∗−1)/2 would not seriously affect its accuracy (guarded byλc). This

explains our condition in Line 3. In this case, the probabilities of r∗ and all internal nodes below r∗ _{will be approximated by the Erd˝}_{os-Rényi model (Lines 6-8). In}

our experiments, we observe that settingτ₁ = 0.05 and τ₂ = 0.01 gives good results over different real-life datasets. Note that the choices of these thresholds are data- independent: the tuning ofτ₁andτ₂only relies onε₂.

Sanitized Graph Generation. With the sampled dendrogram T_{sam p l e} and the set of

noisy probabilities{fp_r}, we generate the sanitized graph as follows (Algorithm 4.3). For each pair of verticesi, j ∈ V , we find their lowest common ancestor r in Tsam p l e

Algorithm 4.3: Generate Sanitized Graph eG

Input : Input graph G, sampled dendrogram T_{sam p l e}, privacy parameterε₂ Output: Sanitized graph eG

1 r_{r oot}← root node of T_{sam p l e};

2 CalculateNoisyProb(G, T_{sam p l e},r_{r oot},ε₂); 3 for each pair of vertices i, j ∈ V do

4 Find the lowest common ancestorr of i and j in T_{sam p l e};

5 Place an edge in eG between i and j with independent probability_fp_r; 6 end

7 return sanitized graph eG;

4.4 Privacy Analysis

In this section, we formally analyze the privacy guarantee of our algorithmHRG.

In document Towards Practicing Privacy in Social Networks (Page 68-72)