Distributed principal component analysis

1.3 Position of the thesis

1.3.3 Distributed principal component analysis

Another dealing problem addressed by the stochastic approximation framework is the principal component analysis (PCA). The objective in such problems is rather different from the considered previously. Indeed, the aim is no longer to find a consensus on common parameter of interest. Here, the aim to drive the iterate of each node i to the value of the i-th entries of the principal eigenvectors of a matrix M .

We define M ∈ RN ×N a symmetric positive semi-definite matrix whose entries describe some similarity metric between each pair of agents, e.g. similarities (multidimensional scaling [95], [27]), distances (WSN localization [57], [143]), customer ratings (user profiling [154], [91]), adjacency weights (spectral clustering [29]) or covariances (signal detection [88], [37]). We assume that a given agent i has only a partial information on the matrix M (typically, it is only able to observe the i-th row of M ). Consider the spectral decomposition of M

M = U ΛUT , U UT = IN (1.12)

where U is an orthonormal matrix whose columns are the eigenvectors of M and Λ is a diagonal matrix containing the corresponding eigenvalues (λ1, . . . , λN) in decreasing order λ1 ≥ · · · ≥

λN. We define k . k the Euclidean norm. For a given integer p < N , the aim is to evaluate the p

largest eigenvalues λ1, . . . , λpand the corresponding eigenvectors, which we denote u1, . . . , up.

When M is perfectly known and data is processed in a centralized manner, several classical methods are known to solve efficiently (1.12) such the power method ([73, p. 406]) when p = 1 and the QR-factorization ([81, p. 114] and called orthogonal iteration in [73, p. 454]) or the Gram-Schmidt orthonormalization [73, p. 254] when p > 1. The centralized power method is based on computing the following recursion (when p = 1):

˜ Un= M Un−1 (1.13) Un= ˜ Un k ˜Unk , (1.14)

where (Un)nis the estimate sequence that converges to the first eigenvector u1 and kxk stands

for the Euclidean norm, i.e. kxk2 = P

i|x(i)|2 such x ∈ RN. From a distributed imple-

mentation viewpoint, both terms M Un−1 and kM Un−1k have drawbacks. For a given agent

i, the first matrix product writes as a sum PN

j=1M (i, j)Un−1,j that contains N terms in-

volving a communication with each separate agent j. Second, for any agent i, (1.14) writes Un(i) = ˜Un(i)/

q PN

Un(j) to implement the normalization update. When N is large, this could incur a prohibitive

cost to the network in terms of number of communications. As a consequence, several works have made efforts to arise a decentralized implementation of (1.13)-(1.14). A couple of works deal with a distributed version of (1.13)-(1.14) (see [90], [92]) by introducing consensus averag- ing to compute the normalization term. While in [90] M is assumed to be perfectly known, [92] include a synchronous sparse model for M Un−1. Contrarily to [90] where each agent i is able

to computePN

j=1M (i, j)Un−1,j, [92] describe a sparse model in which each agent i transmits

M (i, j)Un−1,jto a small set of randomly chosen neighbors.

In this thesis we seek to design an algorithm which is

• distributed: nodes cooperate in order to separately estimate different entries of the principal eigenvectors;

• on-line: matrix M is unobserved, but a sequence (Mn)n of perturbed/noisy versions of

M is generated.

The sequence (Mn)nis written as Mn= M + Ξnwhere the perturbation matrix Ξnis typically

a martingale increment. In the centralized case, when a sequence of matrices (Mn)n is glob-

ally observed at a central computing unit, stochastic approximation can be used to estimate the eigenvectors of M . Oja’s algorithm can be used for that sake (see [120] for p = 1 and [122] for p > 1). We also refer to (see [57], [24], [85]) for alternative approaches to solve (1.12) by semidefinite programming based on constraint optimization. In this work we introduce a distributed version of Oja’s algorithm.

We define as Un= (un,1, . . . , un,p)T the p-principal components estimated at time n. In the

Oja’s algorithm [122], the estimate sequence Unis generated by:

Un= Un−1+ γn MnUn−1− Un−1(Un−1T MnUn−1) . (1.15)

Note that (1.15) boils down to a Robbins-Monro algorithm [139]. Due to possible instabilities of the algorithm, several variants have also been proposed, which either introduce a normalization or a projection step (see [29]).

Distributed variants of the algorithm have been investigated, often in specific contexts user profiling[147] or signal estimation (or detection) in WSN [102]. Both works have two common features: proposing a distributed version of (1.15) and including average consensus iterates of the form [31] in their algorithm to perform some of the terms in (1.15) distributedly. Indeed, these approaches require two time scales, i.e. the iteration index n to update Un and another

time index corresponding the number of consensus cycles of the form (1.5). In particular, the Authors of [147] address a machine learning context where observations M correspond to a large matrix containing user taste ratings (binary) of some products. Under the assumption that M is a low-rank matrix, the aim is to estimate the profile vector of each user. A distributed Oja’s algorithm is proposed to perform the spectral decomposition of a partially known dataset Mn. A normalization term is included in (1.15) to avoid stability issues. The term MnUn−1is

performed by a fixed sparse model, i.e. each agent i observes a small set of Mn(i, j)Un−1(j)

from its neighbors j at each iteration n of the Oja’s update. The introduced normalization term and (U_n−1T MnUn−1) are both performed by an average consensus during several rounds be-

1.3. Position of the thesis 39

the eigendecomposition of a signal’s covariance matrix M from noisy received measurements within a wireless sensor network, i.e. the received signal model is assumed as "high energy signal + zero mean random noise". Yet, finding the p-principal eigenvectors of M refers to capturing the high-energy components of received data in order to detect and estimate the in- coming signal of interest. The Authors of [102] assume an estimate covariance matrix such Mn= n−1Pnt=0rtrtH where (rt∈ CN)t≥0describe received measurements at N sensors. Un-

der the latter model assumptions, three terms are identified when describing recursion (1.15) to define its distributed implementation, i.e. r_nHUn−1, Un−1H Un−1and Un−1H rnrHnUn−1. The pro-

posed distributed Oja’s algorithm is performed by introducing three average consensus for the latter terms involving several rounds. At the end of this step, each sensor node is able to update Un(i).

Differently to [147], [102], we propose a distributed Oja algorithm to estimate the principal components of M in a general setting without explicitly giving a model for observations (Mn)n.

In this thesis, we consider the following model. At each instant n, each node i observes some random noisy samples of the i-th row of the matrix Mn. Each node i sends and/or receives

variables from other nodes j in the network, chosen at random (contrarily to [147] that considers fixed links). The matrix products involved in the Oja’s update, i.e. MnUn−1and Un−1T MnUn−1,

are performed via an asynchronous communication model different to the average consensus model [31] required in [147], [102]. We define at each sensor i two random sequences yn(i)

and zn(i) as unbiased estimates of the correspondingP_jM (i, j)Un−1(j) and (Un−1T MnUn−1)

respectively. Besides, we introduce a projection step at each iteration n that enables Unto remain

bounded in a compact set in order to avoid unstabilities on sequence (Un)n. The convergence of

the proposed algorithm is analyzed in the asymptotic regime where n tends to infinity. Although the implementation and the objective is different to (1.11), both are related through the Robbins- Monro framework. Thus, the convergence analysis of the sequence generated by the proposed algorithm implies the existence of a mean field function h(U ) whose roots correspond with the underlying eigenspace of M , i.e. U ∈ {h(U ) = 0} verify (1.12). Hence, similarly to [122], [29], the convergence analysis is mainly characterized by addressing: the stability of Un, the

definition of h(U ) and its roots {h(U ) = 0}.

Next, we investigate application of our algorithm to self-localization in wireless sensor networks since numerical results can be provided from real data.

Application: self-localization in wireless sensor networks

In signal processing, an interesting motivation to design a distributed, asynchronous and on- line of Oja’s algorithm (1.15) relies on its application to the localization problem in wireless sensor networks (see [57], [27], [143], [24], [92], [41]). The theory of multidimensional scaling (MDS) [95] deals with the following general problem: find an embedding configuration of N objects when only similarity/distance data are available. In particular, the method referred as classical MDS [27, Chapter 12] considers Euclidean distances between N positions in a coordi- nate space of dimension p. In that case, classical MDS performs the PCA (1.12) of M defined

as follows:

M = −1

2J⊥DJ⊥ (1.16)

where matrix D contains the square distances and J⊥= I − 1/N 11T.

In the WSN context, classical MDS (also known as MDS-MAP [143]) recovers the positions of a network formed by N sensor nodes (up to a rotation/translation/reflection). We denote by zi the position of any sensor node i and ¯z the barycenter of the network, i.e. ¯z = _N1

izi. In

the case of Euclidean space, the entries of D are related to the sensor nodes positions as: D(i, j) = kzi− zjk2. (1.17)

Then, using (1.17) to (1.16) means that M = ZZT, where the i-th line of matrix Z coincides with zi− ¯z. Hence, the PCA problem (1.12) applied to (1.16) within the WSN context, reduces

to find M = ZZT such that Z = U Λ1/2 ∈ RN ×p_{(usually p = 2 or p = 3). We define each}

recovered node position Z(i) as Z(i) = (√λ1u1(i), . . . ,pλpup(i)).

The centralized localization approach introduced by [143] (theoretical analysis in [85]) involves two main steps: first obtain the square pairwise distances D(i, j) between the sensor nodes and compute (1.16) (also referred as double-centering); and second, find the p-principal components of M . In wireless sensor networks, the acquisition of D is not directly possible. Thus, distances may be estimated from other available measurements depending on the elec- tronics of the sensor node devices, e.g. received signal strength indicator (RSSI), time-of-arrival (TOA) or angle-of-arrival (AOA) (see [126], [105]).

In this thesis, we focus on RSSI-based techniques since the wireless sensor nodes considered in our experiments are issued to the FIT IoT-LAB platform [1]. We define an unbiased estimator of the square distance based on the standard parametric Log-Normal Shadowing Model (LNSM) of [136]. Besides, the PCA step of our approach involves: a distributed version of the Oja’s algorithm (1.15) along with an observation model that enables each sensor node to estimate the i-th row of Mn.

In document Doctorat ParisTech. TELECOM ParisTech. Analyse d algorithmes distribués pour l approximation stochastique dans les réseaux de capteurs (Page 37-40)