Consensus and Sharing
7.1 Global Variable Consensus Optimization
7.1.1 Global Variable Consensus with Regularization
In a simple variation on the global variable consensus problem, an objective termg, often representing a simple constraint or regulariza- tion, is handled by the central collector:
minimize Ni=1fi(xi) +g(z)
subject to xi−z= 0, i= 1, . . . , N.
The resulting ADMM algorithm is xki+1 := argmin xi fi(xi) +ykTi (xi−zk) + (ρ/2)xi −zk22 (7.3) zk+1 := argmin z g(z) + N i=1 (−ykTi z+ (ρ/2)xki+1−z22) (7.4) yik+1 := yik+ρ(xki+1 −zk+1). (7.5) By collecting the linear and quadratic terms, we can express the z- update as an averaging step, as in consensus ADMM, followed by a proximal step involving g:
zk+1 := argmin z g(z) + (N ρ/2)z−xk+1 −(1/ρ)yk22 .
In the case with nonzero g, we do not in general have yk= 0, so we cannot drop theyi terms from z-update as in consensus ADMM.
As an example, forg(z) =λz1, withλ >0, the second step of the
z-update is a soft threshold operation:
zk+1:=Sλ/N ρ(xk+1 −(1/ρ)yk).
As another simple example, supposegis the indicator function of Rn+, which means that thegterm enforces nonnegativity of the variable. In this case, the update is
zk+1:= (xk+1−(1/ρ)yk)+.
The scaled form of ADMM for this problem also has an appealing form, which we record here for convenience:
xki+1 := argmin xi fi(xi) + (ρ/2)xi−zk+uki22 (7.6) zk+1 := argmin z g(z) + (N ρ/2)z−xk+1−uk22 (7.7) uki+1 := uki +xki+1−zk+1. (7.8) In many cases, this version is simpler and easier to work with than the unscaled form.
7.2 General Form Consensus Optimization 53 7.2 General Form Consensus Optimization
We now consider a more general form of the consensus minimization problem, in which we have local variablesxi∈Rni, i= 1, . . . , N, with
the objective f1(x1) +···+fN(xN) separable in thexi. Each of these
local variables consists of a selection of the components of the global variable z∈Rn; that is, each component of each local variable corre- sponds to some global variable componentzg. The mapping from local
variable indices into global variable index can be written asg=G(i, j), which means that local variable component (xi)j corresponds to global
variable componentzg.
Achieving consensus between the local variables and the global vari- able means that
(xi)j=zG(i,j), i= 1, . . . , N, j= 1, . . . , ni.
If G(i, j) =j for all i, then each local variable is just a copy of the global variable, and consensus reduces to global variable consen- sus, xi=z. General consensus is of interest in cases where nin,
so each local vector only contains a small number of the global variables.
In the context of model fitting, the following is one way that general form consensus naturally arises. The global variable z is the full fea- ture vector (i.e., vector of model parameters or independent variables in the data), and different subsets of the data are spread out amongN
processors. Thenxi can be viewed as the subvector of zcorresponding
to (nonzero) features that appear in the ith block of data. In other words, each processor handles only its block of dataand only the sub- set of model coefficients that are relevant for that block of data. If in each block of data all regressors appear with nonzero values, then this reduces to global consensus.
For example, if each training example is a document, then the fea- tures may include words or combinations of words in the document; it will often be the case that some words are only used in a small sub- set of the documents, in which case each processor can just deal with the words that appear in its local corpus. In general, datasets that are high-dimensional but sparse will benefit from this approach.
Fig. 7.1. General form consensus optimization. Local objective terms are on the left; global variable components are on the right. Each edge in the bipartite graph is a consistency constraint, linking a local variable and a global variable component.
For ease of notation, let ˜zi∈Rni be defined by (˜zi)j=zG(i,j). Intuitively, ˜zi is the global variable’s idea of what the local variable xi should be; the consensus constraint can then be written very simply
asxi −z˜i = 0,i= 1, . . . , N.
The general form consensus problem is minimize Ni=1fi(xi)
subject to xi−z˜i= 0, i= 1, . . . , N,
(7.9)
with variablesx1, . . . , xN and z (˜zi are linear functions ofz).
A simple example is shown in Figure 7.1. In this example, we have
N = 3 subsystems, global variable dimensionn= 4, and local variable dimensionsn1= 4,n2= 2, andn3= 3. The objective terms and global variables form a bipartite graph, with each edge representing a con- sensus constraint between a local variable component and a global variable.
The augmented Lagrangian for (7.9) is
Lρ(x, z, y) = N i=1 fi(xi) +yiT(xi −z˜i) + (ρ/2)xi−z˜i22 ,
7.2 General Form Consensus Optimization 55 with dual variableyi∈Rni. Then ADMM consists of the iterations
xki+1 := argmin xi fi(xi) +ykTi xi + (ρ/2)xi −z˜ik22 zk+1 := argmin z m i=1 −yikTz˜i + (ρ/2)xki+1−z˜i22 yki+1 := yik+ρ(xki+1 −z˜ik+1),
where thexi- andyi-updates can be carried out independently in par-
allel for eachi.
The z-update step decouples across the components of z, since Lρ
is fully separable in its components:
zgk+1:= G(i,j)=g (xki+1)j + (1/ρ)(yki)j G(i,j)=g1 ,
sozgis found by averaging all entries ofxki+1+ (1/ρ)yikthat correspond
to the global index g. Applying the same type of argument as in the global variable consensus case, we can show that after the first iteration,
G(i,j)=g
(yki)j = 0,
i.e., the sum of the dual variable entries that correspond to any given global index g is zero. The z-update step can thus be written in the simpler form
zkg+1:= (1/kg)
G(i,j)=g
(xki+1)j,
where kg is the number of local variable entries that correspond to
global variable entryzg. In other words, thez-update is local averaging
for each componentzg rather than global averaging; in the language of
collaborative filtering, we could say that only the processing elements that have an opinion on a featurezg will vote onzg.