Product constructions reduce a relational property of two programs to a non-relational property of a single program, so that more standard techniques can be brought to bear. We close this chapter by comparing our coupled product to other existing constructions.
Almost all product constructions were originally designed with non-probabilistic programs in mind, targeting relational properties like information flow and correctness of compiler transformations. These approaches includeself composition(Barthe, D’Argenio, and Rezk,2011b), thecross product(Zaks and
Pnueli,2008),type-directedproduct programs (Terauchi and Aiken,2005), and more (Barthe, Crespo, and Kunz,2011a,2013a). A basic consideration is how to handle different control flow in the two programs. If the two programs have the same shape and always take the same branches, the product program can interleave instructions from the two programs. If the two programs are very different or if the control flows are not synchronized, an asynchronous construction can combine the two programs sequentially.
These approaches have different strengths and weaknesses. By placing corresponding instructions close to one another, synchronized constructions can better leverage similarity between programs and can often be verified with simpler invariants and more local reasoning. However, asynchronous products apply to a wider range of programs. The design of×PRHL, and in particular the asynchronous rule
[WHILE-GEN], allows product programs that are both synchronous and asynchronous.
Probabilistic programs introduce additional challenges for product constructions. Existing construc- tions can be blindly applied to randomized programs, but the results use two independent sources of randomness, and are difficult to reason about—there is no coordination between the two programs on sampling instructions, whether the construction has a synchronous structure or not. A notable exception is the product construction byBarthe, Gaboardi, Gallego Arias, Hsu, Kunz, and Strub(2014b), which is specialized to proving differential privacy. Their construction eliminates the random sampling statements entirely, yielding a synchronized, non-probabilistic product. In fact, their product is based on a variant of probabilistic couplings calledapproximate liftings; we turn to these couplings in the rest of the thesis.
Chapter 4
Approximate couplings for privacy
The first half of this thesis connected proofs by coupling with the logicPRHL, using ideas from the former to enhance the latter. We now explore a similar connection in reverse, using concepts from program logics to develop a novel form of probabilistic coupling and a new proof technique. Our starting point isAPRHL, anapproximateversion ofPRHL proposed byBarthe et al.(2013c) for verifyingdifferential privacy, a statistical notion of data privacy. This logic was originally based on an approximate version of probabilistic lifting. By interpreting approximate liftings as a generalization of probabilistic coupling and reverse-engineering an approximate version of proof by coupling fromAPRHL, we can give a powerful method to prove differential privacy.
After briefly reviewing differential privacy (Section4.1), we propose a new definition of approximate lifting and explore its theoretical properties (Section4.2); our approximate liftings are a natural, approxi- mate version of probabilistic couplings. To build approximate couplings, we review a core versionAPRHL (Section4.3) and extract a proof technique inspired by the logic, calledproof by approximate coupling
(Section4.4). We then extendAPRHL with proof rules modeling new approximate couplings (Section4.5) and a principle calledpointwise equalityfor proving differential privacy (Section4.6). As applications, we give new proofs of privacy for theReport-noisy-maxandSparse Vectormechanisms (Section4.7). Our approximate coupling proofs are significantly cleaner than existing arguments, and can be formalized in
APRHL, enabling the first formal privacy proofs for these mechanisms. Finally, we survey other verification techniques for differential privacy, and research on approximate liftings (Section4.8).
4.1
Differential privacy preliminaries
Differential privacy, proposed byDwork, McSherry, Nissim, and Smith(2006), is a strong, probabilis- tic notion of data privacy that has attracted intensive attention across computer science and beyond. Differential privacy is a relational property of probabilistic programs.
Definition 4.1.1. Let",δbe non-negative parameters. Consider a setDwith a binaryadjacencyrelation
Adj; we sometimes callDthe set ofdatabases. Let therangeRbe a set of possible outputs. A function
M:D→Distr(R)—often called amechanism—is(",δ)-differentially privateif for all pairs of adjacent inputs(d,d0)∈Adjand all subsetsS⊆Rof outputs, we have
M(d)(S)≤exp(")·M(d0)(S) +δ. Whenδ=0, we sayMis"-differentially private.
The adjacency relation describes which pairs of databases should lead to approximately indistinguish- able outputs—intuitively, which pairs of databases differ only in the data of a single person. For instance, if a database is a set of records belonging to different people, we can consider two databases to be adjacent if they are identical except for an additional individual’s record in one database. Then under differential privacy, a mechanism’s output must be nearly the same whether any single individual’s private data is part
of the input or not. The degree of similarity—and the strength of the privacy guarantee—are governed by the parameters"andδ: smaller values give stronger guarantees, while larger values give weaker guarantees.
While typical notions of adjacency are symmetric, much of the theory of differential privacy applies to arbitrary relations. However, there are a few notable results that crucially need a symmetric adjacency relation—we will highlight these cases as they arise.
Standard private mechanisms
The most basic example of a differentially private mechanism is theLaplace mechanism, which evaluates a numeric query on a database and adds random noise drawn from the Laplace distribution. For instance, the target query could compute the average age, or count the number of patients with a certain disease. While the Laplace distribution is a continuous distribution over the real numbers, we work with a discrete version to avoid measure-theoretic technicalities. For concreteness we take the samples to be integers; our results can be easily adapted to finer discretizations.1
Definition 4.1.2. Let" >0. The(discrete) Laplace distributionwith parameter", writtenLap", is the distribution over the integers wherev∈Zhas probability proportional to exp(−|v| ·"):
Lap"(v)¬ exp(−|v| ·")
W ,
withW¬Pz∈Zexp(−|z| ·"). We writeLap"(t)for the Laplace distribution with meant∈Z; sampling from this distribution is equivalent to sampling fromLap"and addingt.
Letq:D→Zbe an integer-valued query. TheLaplace mechanismwith parameter"takes a database
d∈Das input and returns a sample fromLap"(q(d)). This mechanism is also known as the"-geometric mechanism(Ghosh, Roughgarden, and Sundararajan,2012).
If the query takes similar values on adjacent databases, the Laplace mechanism is differentially private. The privacy parameters depend on the sensitivityof the query—the more the answers may differ on adjacent databases, the weaker the privacy guarantee.
Theorem 4.1.3 (Dwork et al.(2006)). A query q:D→Zis k-sensitiveif|q(d)−q(d0)| ≤k for every
pair of adjacent databases. Releasing a k-sensitive query with the Laplace mechanism with parameter"is (k·", 0)-differentially private.
Composition theorems
Differential privacy is closed under several notions of composition, making it easy to build new private algorithms out of private components. Thesequential, orstandard composition theoremis the most basic example. When running two private computations in sequence—where the second computation may use the input database as well as the randomized output from the first computation—the privacy guarantee should weaken, since we run more analyses on the data. Indeed, the privacy parameters simply add up. Theorem 4.1.4 (Dwork et al.(2006)). Let M :D →Distr(R)be (",δ)-differentially private and let M0 :R×D→Distr(R)be such that M0(r,−):D→Distr(R)is("0,δ0)-differentially private for every r∈R. Given a database d ∈D, sampling r from M(d)and then returning a sample from M0(r,d)is ("+"0,δ+δ0)-differentially private.
This useful theorem has two immediate consequences. First, ifM0depends only on its first argument
rand ignores its database argumentd, thenM0(r,−)is(0, 0)-differentially private. So, transforming the output of a differentially-private algorithm does not degrade privacy; this property is also calledclosure under post-processing.
Second, by repeatedly applying the composition theorem, the composition of n separate(",δ)- differentially private mechanisms is(n",nδ)-differentially private. In certain parameter ranges, an alternative,advanced compositiontheorem can bound the privacy level with a smaller"at the cost of a slightly largerδ. This result crucially assumes a symmetric adjacency relation.
Theorem 4.1.5(Dwork, Rothblum, and Vadhan(2010)). Fix asymmetricadjacency relation onD. Let fi:R×D→Distr(R)be a sequence of n functions such that for every r∈R, the functions fi(r,−):D→ Distr(R)are(",δ)-differentially private. Then for everyω∈(0, 1), the mechanism that executes f1, . . . ,fn in sequence and returns the final output is("∗,δ∗)-differentially private for
"∗="Æ
2nln(1/ω) +n"(e"−1) and δ∗=nδ+ω.
In particular, if we have"0∈(0, 1),ω∈(0, 1/2), and "= "0
2p2nln(1/ω),
a short calculation2shows that the composition is("0,δ∗)-differentially private.
We omit other standard composition theorems (e.g., parallel composition) as we will not need them; readers can consult the textbook byDwork and Roth(2014) for more information.
Remark4.1.6. The sequential composition theorem allows reasoning about differential privacy in terms ofprivacy costs. We can imagine tracking an algorithm’s privacy parameters, initially(0, 0). Every time the algorithm applies an(",δ)-private mechanism, we increment the current parameters by(",δ); the final parameters give the privacy level for the whole algorithm. In this way,(",δ)represents thecostof using a private subroutine.
While this observation seems to be a restatement of the composition theorems, merely a convenient accounting method, the subtlety lies in how the costs are computed. The key point is that outputs from previous private mechanisms are assumed to beequalwhen computing the cost of subsequent operations. Changing the perspective a bit, we can pay cost(",δ)to assume two outputs in related runs of an
(",δ)-private mechanism are equal. We can begin to see the rough contours of a proof by coupling; we will soon make this idea more precise.