• No results found

Graph Signal Processing: Reconstruction Algorithms

N/A
N/A
Protected

Academic year: 2021

Share "Graph Signal Processing: Reconstruction Algorithms"

Copied!
106
0
0

Loading.... (view fulltext now)

Full text

(1)

Universit`

a degli Studi di Padova

Department of Information Engineering

Master Thesis in Telecommunication Engineering

Graph Signal Processing:

Reconstruction Algorithms

Supervisor Master Candidate

Tomaso Erseghe Marco Ceccon

Universit`a di Padova

(2)
(3)

Abstract

In the last years we have been experiencing an explosion of information gener-ated by large networks of sensors and other data sources. Much of this data is intrinsically structured, such as traffic evolution in a transportation network, tem-perature values in different geographical locations, information diffusion in social networks, functional activities in the brain, or 3D meshes in computer graphics. The representation and analysis of such data is a challenging task and requires the development of new tools that can identify and properly exploit the data structure.

In this thesis, we formulate the processing and analysis of structured data using

the emerging framework of graph signal processing. Graphs are generic data

representation forms, suitable for modeling the geometric structure of signals that resides on topologically structured domains. The vertices of the graph represent the discrete data domain, and the edge weights capture the pairwise relationships between the vertices. A graph signal is then defined as a function that assigns a real value to each vertex. Graph signal processing is a useful framework for handling efficiently such data as it takes into consideration both the signal and the graph structure.

In this work, we study the common features and properties of signals defined on graphs and we focus on a specific application related to the reconstruction of graph signals in both centralized and distributed settings.

(4)
(5)

Sommario

Negli ultimi anni abbiamo sperimentato una esplosione di informazioni generate da grandi reti di sensori e da altre fonti di dati. Gran parte di questi dati hanno una struttura intrinseca, come l’evoluzione del traffico in una rete di trasporto, i valori

di temperatura in diverse localit`a geografiche, la diffusione di informazioni nelle

reti sociali, le attivit`a cerebrali, o superfici tridimensionali in computer grafica.

La rappresentazione e l’analisi di tali dati `e un compito impegnativo e richiede lo

sviluppo di nuovi strumenti in grado di identificare e sfruttare correttamente la struttura dei dati.

In questa tesi, impostiamo l’elaborazione e l’analisi di dati strutturati che uti-lizzano il contesto emergente dell’elaborazione di segnali definiti su grafi. I grafi sono forme di rappresentazione di dati generiche, adatte per modellare la strut-tura geometrica di segnali che risiedono in domini topologicamente strutstrut-turati. I vertici del grafo rappresentano i dati in un dominio discreto, e i pesi dei lati del grafo esprimono le relazioni tra vertici connessi. Un segnale su un grafo viene quindi definito come una funzione che assegna un valore reale a ciascun vertice.

L’elaborazione dei segnali definiti su grafi `e un contesto utile per la gestione

ef-ficiente di tali dati, che tiene in considerazione sia il segnale e la struttura del grafo.

In questo lavoro, studiamo le principali caratteristiche e propriet`a dei segnali

definiti su grafi e ci concentriamo su una specifica applicazione relativa alla ri-costruzione dei segnali definiti su grafi, sia in contesti centralizzati che distribuiti.

(6)
(7)

Contents

Abstract ii

List of figures ix

List of tables xi

Listing of acronyms xiii

1 Introduction 1

1.1 Motivation . . . 2

1.2 Thesis Outline . . . 4

2 Graph Signal Processing Overview 5 2.1 Introduction . . . 5

2.2 Graphs And Signals On Graphs . . . 6

2.3 Graph Spectral Domain . . . 11

2.4 Applications Of Graph-Based Signal Processing . . . 19

2.4.1 Processing with graph-based priors . . . 19

2.4.2 Distributed Processing Of Graph Signals . . . 20

2.4.3 Graph-Based Multimedia Processing . . . 23

3 Graph Signal Reconstruction 27 3.1 Reconstruction Problem . . . 27

3.2 Random Sampling and Frequency Ordering . . . 34

3.3 Performance of the LS Reconstruction Algorithm . . . 36

3.4 `1 Regularization Sparsity . . . 42

4 Distributed Algorithms 51 4.1 Average Consensus . . . 51

4.2 ADMM: Alternating Direction Method of Multipliers . . . 55

4.2.1 Distributed ADMM - LS Solution . . . 58

4.2.2 Distributed ADMM - `1 Regularization Solution . . . 66

5 Conclusions And Future Work 79 5.1 Future Work . . . 80

(8)
(9)

Listing of figures

2.1 A graph defined on 50 nodes . . . 6

2.2 Gershgorin circles for Metropolis weights . . . 8

2.3 Gershgorin circles for unweighted adjacency matrix . . . 9

2.4 Random signal defined on 50 nodes . . . 10

2.5 Eigenvectors u0, u1,u2 and u49 . . . 11

2.6 Number of zero-crossings . . . 12

2.7 Exponential kernel and its IGFT . . . 14

2.8 Translated versions of kernel signal . . . 16

3.1 Temperature for 2 different months . . . 31

3.2 An example of signal sampling . . . 34

3.3 Frequency ordering . . . 35

3.4 MSE for different weighting methods . . . 39

3.5 MSE for different weighting methods . . . 40

3.6 LS signal reconstruction for month March, Metropolis weights . . 41

3.7 LASSO and ridge constraint comparison . . . 44

3.8 `1-norm signal reconstruction for month March, Metropolis weights 46 3.9 MSE for`1-norm problem, different λ . . . 46

3.10 MSE for `1-norm problem, different number of samples . . . 47

3.11 MSE comparison, 99 frequencies, λ= 1 . . . 48

3.12 MSE comparison, 10 frequencies, λ= 1 . . . 49

4.1 Convergence of average consensus algorithm . . . 54

4.2 Convergence of the ADMM solution - LS . . . 63

4.3 Convergence of the ADMM variables - LS . . . 64

4.4 Average time to run ADMM - LS . . . 65

4.5 Convergence of the ADMM solution - `1-norm regularization (10 freq.) . . . 69

4.6 Convergence of the ADMM solution - `1-norm regularization (99 freq.) . . . 70

4.7 Convergence of the ADMM solution - comparison . . . 71

4.8 Average time to run ADMM -`1-norm regularization . . . 72

4.9 Comparison of distributed algorithms - 1 . . . 73

4.10 Convergence of ADMM, LS solution, updated penalty parameter (0 = 0.001) . . . 74

(10)

4.11 Convergence of ADMM, LS solution, updated penalty parameter

(0 = 0.01) . . . 75

4.12 Convergence of ADMM, LASSO solution, updated penalty

param-eter (0 = 0.001) . . . 76

4.13 Convergence of ADMM, LASSO solution, updated penalty

param-eter (0 = 0.01) . . . 77

(11)

Listing of tables

3.1 Temperature dataset, index 1:33 . . . 28

3.2 Temperature dataset, index 34:66 . . . 29

(12)
(13)

Listing of acronyms

ADMM Alternating Direction Method of Multipliers

ATC Adapt To Combine

DCT Discrete Cosine Transform

DFT Discrete Fourier Transform

GFT Graph Fourier Transform

IGFT Inverse Graph Fourier Transform

LASSO Least Absolute Shrinkage and Selection Operator

LS Least Square

MSE Mean Squared Error

SVM Support Vector Machine

(14)
(15)

1

Introduction

Modern information processing inevitably involves an extremely large volume of increasingly complex data. The complexity comes, in particular, from the intrinsic structure of the framework on which these data resides. Data observed by different sensors could be intrinsically related by some structures, where the data could represent different kind of information. For instance, temperatures observed at different regions are related to their geographical proximities, traffic volumes at different locations in a transportation network depends on the topology of the network, and behaviour of a group of persons may be influenced by the friendships among them. To handle such complex data efficiently, we need to understand the interactions between different sources of information as well as the relationships and structures among them.

Graphs are powerful mathematical tools to model relationships and structures of the data. In a graph representation, the vertices represent the entities and the edges represent the pairwise relationships between these entities. Moreover, graph-based data are flexible and adaptable to incorporate multiple information with relationships and structures among them, yet remaining sufficiently simple for efficient processing: we can think to the temperature of different sensors, taken at different time instants.

Signal processing on graphs is an emerging research field which has recently at-tracted growing interests in the signal processing community. In this setting, the

(16)

vertices of the graph represent entities and the edge weights reflect the pairwise relationships between them, while a graph signal assigns a scalar value to each ver-tex based on some observation associated with the entities. Graph signals capture the relationships between the observations, thus reflect the structures in the data, and they can represent a various sources of information. Numerical examples of graph signals can be found in geographical, transportation, biomedical and social networks, such as temperatures within a geographical area, traffic capacities at hubs in a transportation network, or human behaviour in a social network.

1.1

Motivation

Over the past few years, we have attended so many information generated by numerous data sources, in a large variety of applications. For example, sensor networks have been widely deployed to measure a plethora of physical entities, like temperature and solar radiation, traffic volumes in transportation networks, brain activities in biological networks. Online social networks have turned into a significant means of communication and contain a lot of information. 3D depth cameras are yet becoming more powerful and widely used to capture dynamic 3D scenes in emerging applications such as gaming, immersive communication

and virtual reality. Such data are usually very complex since they are

high-dimensional and occupy a large amount of storage space. Furthermore, data

are intrinsically and possibly irregularly structured. For instance, wireless sensor networks are irregularly deployed in space and their measurements depend on their geographical positions. Also, data and structure may be generated by different sources of information. For example, the information spread in social networks may be influenced by the relationships between the entities, as well as the type of data itself. The representation, analysis, and compression of such data is a challenging task that requires the development of new tools that can identify and properly exploit data structures.

In this thesis, we study the representation and analysis of structured data in the context of the emerging graph signal processing framework. Graphs are generic data representation forms that are suitable for modeling the geometric structure of signals that live on topologically complicated domains, including social networks, electricity networks, transportation networks, and sensor networks, where data

(17)

naturally reside on the vertices of weighted graphs. These signals are either in-trinsically discrete (e.g., attributes of entities in social networks) or sampled from a continuous process. Typically, the vertices of the graph represent the discrete data domain and carry the data values. The edge weights of the graph capture the pairwise relationships between the vertices, like geographical distance or bio-logical connections, for example. A graph signal is then defined as a function that assigns a real value to each vertex.

The weight associated with each edge in the graph often represents the similarity between the two vertices it connects. The connectivities and edge weights are either dictated by the physics of the problem or inferred from the data. For instance, the edge weight may be related to the physical distance between nodes in the network, or it may be related to the degrees of the connected vertices (that is, the number of edges connected to the same vertex). The data on these graphs can be visualized as a finite collection of samples, with one sample at each vertex of the graph.

Graph representations lead to rich data description on irregular domains and, if properly exploited, permit to efficiently capture the evolution of signals in a priori complex high-dimensional data sets. Signals and graphs are usually defined using different types of information which, if combined properly, can be quite helpful in analyzing or inferring information in the datasets. Moreover, graph signal repre-sentations provide a natural way to handle signals that cannot be easily processed with classical tools due to their irregular structure. The price to pay for this flexibility is the fact that one has to develop new tools and algorithms that han-dle efficiently the graph structure, possibly by leveraging intuition from classical signal processing in Euclidean spaces. Adapting classical signal processing tools to signals defined on graphs has however raised significant interest in the last few years. It requires the combination of different fields such as algebraic and spectral graph theory, harmonic analysis, and application domain expertise. Even if this research area looks highly promising because it provides a framework for modeling complex and irregularly structured discrete datasets, the challenges are many and the field is still in expansion.

(18)

1.2

Thesis Outline

The goal of this thesis is to present solutions as well as analysis of a few of the most important issues that arise in the emerging field of graph signal processing.

The thesis is organized as follows:

• Initially, we review in chapter 2 the current state-of-the-art methods for

graph signal representations and their applications in both centralized and distributed settings. First, we give the basic definitions and notation used in this thesis for graphs and signals on graphs, and we review the generalization of classical transforms, such as Fourier, to the irregular graph domain. Then, the chapter concludes with applications of graph signal processing in visual data representation, processing and compression.

• Chapter 3 introduces and studies a common application of graph signal

processing, that is the reconstruction problem: given a sampled graph signal, defined only on a subset of the vertices of the graph, the main challenge consists in extract the missing part of the signal. To solve this problem, we propose two different algorithms, based on two different function that need to be minimized.

• In chapter 4 we propose distributed versions of the reconstruction

algo-rithms. First, we present a simple interpretation of the solution, based

on the average consensus algorithm. Then, we apply a more sophisticated method that involves the introduction of some auxiliary variables, that are minimized in an alternating fashion. Finally, we expose some numerical results of the distributed algorithms.

• Finally, Chapter 5 draws some conclusions and suggests possible future

(19)

2

Graph Signal Processing Overview

2.1

Introduction

In order to efficiently represent graph signals, it is necessary to take into account for the intrinsic geometric structure of the underlying graph. Signal characteris-tics, such as smoothness, depend on the irregular topology of the graph on which the signal is defined. Classical signal processing tools designed for regular sig-nal structures are therefore inappropriate for the irregular structure in the graph setting. In the last years a lot of work has been dedicated to design new tools and algorithms that can handle efficiently the new challenges arising from the irregular structure of networks or other graph supports. These tools are based on a combination of computational harmonic analysis with algebraic and spectral graph theoretical concepts [1].

In this chapter, we review principal graph signal processing methods from the literature, which are related to the problems studied in this thesis. First, we give some basic definitions and notation for graphs and signals on graphs, that will be used in the rest of the thesis. Next, we review the generalization of classical trans-forms such as Fourier to the irregular graph domain. In the sequel, we focus on the use of graph-based signal processing tools in different applications. In partic-ular, we focus on graph signals reconstruction and distributed processing. Finally,

(20)

we quick review the use of graph-based signal processing tools for image and 3D data, which represent a popular application area for this emerging framework.

2.2

Graphs And Signals On Graphs

In this section, we briefly recall a few basic definitions for signals on graphs. We

generally consider a connected, weighted and undirected graph G = (V,E,A)

where V and E represent the vertex and edge sets of the graph respectively, and

A represents the weighted adjacency matrix, with Aij = Aji (since the graph is

undirected) denoting the weight of the edge connecting vertices i and j. If there

is not an edge between node i and node j, we assume Aij = 0. The degree of a

nodeiis defined as the sum of the weight of the edges incident on that node, that

can be computed as the sum of the weight values in thei-th row of the weighted

adjacency matrix A. We assume that the graph is connected and that it consists

of N nodes. The n-hop neighborhoodN (i, n) ={v ∈ V :d(v, i)≤n}of node iis

the set of all nodes that are at most n-hop away from nodei.

(21)

The combinatorial graph Laplacian operator, also called the non-normalized graph Laplacian, is defined as

L=D−A (2.1)

whereDis the diagonal degree matrix whosei-th diagonal element is equal to the

degree of nodei(the sum of the weights of all the edges incident to vertexi[2]) and

Ais the weighted adjacency matrix. It is a positive semi-definite matrix that has

a complete set of real orthonormal eigenvectors with corresponding non-negative eigenvalues. We denote its eigenvectors by{u`}`=0,1,...,N−1, and the associated real,

non-negative sorted spectrum of eigenvalues by

σ(L) = {0 =λ0 ≤λ1 ≤λ2 ≤ · · · ≤λN−1}

satisfying Lu` = λ`u` for ` = 0,1, . . . , N −1. Then we can write the Laplacian

eigendecomposition L = UΛU∗, where U collects all the eigenvectors of L in

its columns, whereas Λ contains the eigenvalues of L. The spectral properties

of matrix L, that is its eigenvalues and eigenvectors structure, are of particular

importance to study the behaviour in the spectral domain. The following useful results are taken from non-negative matrix theory, [3].

According to the Gershgorin circle theorem, the eigenvalues of the Laplacian

L of a graph G are located inside the discs in the complex plane with centers in

Lii and radius given by the row-sums

N

P

j=1,j6=i

|Lij| for each i, where | · | denotes

absolute value. Since by definition the diagonal entries ofLare non-negative and

all row-sums are equal to zero, the Gershgorin circles are tangent to the imaginary axis at zero. Fig. 2.2 visualizes an example of Gershgorin circles for the Laplacian in the complex plane, if we consider the weighted adjacency matrix by taking the Metropolis weights as defined in (3.10).

(22)

0 0.5 1 1.5 2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Eigenvalues

Figure 2.2: Gershgorin circles for Metropolis weights

Therefore, the eigenvalues ofLhave non-negative real parts and are all inside a

circle of radius 2dmax wheredmax is the maximum degree over all vertices. In this

case we can see that the eigenvalues, in the case of Metropolis weight adjacency

matrix, are limited within a circle of radius 1 centered in [1,0]. In the case of

unweighted adjacency matrix, defined in (3.15), we can see from Fig.2.3 that the eigenvalues are restricted inside a circle of bigger radius, because the adjacency matrix is not normalized as in the Metropolis case.

(23)

0 10 20 30 40 50 -25 -20 -15 -10 -5 0 5 10 15 20 25 Eigenvalues

Figure 2.3: Gershgorin circles for unweighted adjacency matrix

Since L·1 =0, where 0 is a zero vector of length N, the smallest eigenvalue

of the non-normalized Laplacian is always zero and its multiplicity is equal to the number of connected components of the graph, and the corresponding eigenvector is a constant vector. The largest eigenvalue depends on the maximum degree of the graph. Moreover, the combinatorial Laplacian is associated with the incidence matrix, as shown in [2].

For connected graphs, the normalized graph Laplacian is closely related to the combinatorial Laplacian and is defined as

L=D−12LD− 1 2 =I−D− 1 2AD− 1 2 (2.2)

where I is the identity matrix. As in the case of the non-normalized Laplacian,

the eigenvalues are non-negative, with the smallest one equal to zero. A nice property of the eigenvalues of the normalized Laplacian is that they are contained

between the interval [0,2], which makes it easier to compare the distribution of

the eigenvalues between different graphs, especially if there is a large difference

(24)

if G is bipartite, i.e. the set of vertices V can be partitioned into two subsets V1

and V2 such that every edge e ∈ E connects one vertex in V1 and one vertex in

V2. Furthermore, the normalized Laplacian eigenvalues are consistent with the

eigenvalues in the spectral geometry and in stochastic processes, such as random walks [2].

The combinatorial and the normalized graph Laplacians are both examples of generalized graph Laplacians [4] and they are both popular in many graph related frameworks. In general, when the graph is almost regular, the combinatorial and the normalized Laplacian have similar spectra. In these thesis we mainly use the combinatorial graph Laplacian and we focus only on undirected graph. For the sake of completeness, we note that the definition of the Laplacian can be easily extended to directed graphs [5].

A graph signal y in the vertex domain is a real-valued function defined on the

vertices of the graphG, such thaty(n) is the value of the function at vertexn ∈ V. An example of a graph and a signal on the graph is given in Fig. 2.4. This signal is generated randomly from a Gaussian distribution, with zero mean and standard deviation of 5. -10 -5 0 5 10

(25)

2.3

Graph Spectral Domain

The fundamental analogy between traditional signal processing and graph signal processing is established through the spectral graph theory [2]. In particular, the generalization of the classical Fourier transform to graph settings has been established through the eigenvectors and the eigenvalues of the graph Laplacian matrix [6], which carry a notion of frequency for graph signals. In particular, the graph Laplacian eigenvectors associated with small eigenvalues correspond to signals that vary slowly across the graph, hence they can be associated with

the notion of low frequency. For connected graphs, the Laplacian eigenvector u0

associated with the eigenvalue 0 is constant and equal to √1

N at each vertex.

u0 u1

u2 u49

(26)

In other words, if two vertices are connected by an edge with a large weight, the values of the low frequency eigenvectors at those locations are likely to be similar. The eigenvectors associated with larger eigenvalues take values that change more rapidly on the graph: they are more likely to have dissimilar values on vertices connected by an edge with high weight. This is demonstrated in both Fig. 2.5, which shows different graph Laplacian eigenvectors for a random sensor network graph, and in Fig. 2.6, which shows the number of zero crossings of each graph

Laplacian eigenvector. The set of zero crossings of a signal y on a graph G is

defined as

ZG(y) ={e = (i, j)∈ E :y(i)y(j)<0};

that is, the set of edges connecting a vertex with a positive signal to a vertex with a negative signal. 0 2 4 6 8 10 12 14 16 λℓ 0 50 100 150 200 250 300

Figure 2.6: Number of zero-crossings

The eigenvectors of the graph Laplacian are therefore considered to represent a

Fourier basis for graph signals. For any function y defined on the vertices of the

(27)

as the inner product with the corresponding eigenvector u` [6] ˆ y(λ`) = hy, u`i= N X n=1 y(n)u∗` (n) (2.3)

where the inner product is conjugate-linear in the first argument, and u∗`(n) is

the conjugate value of the eigenvector u` at node n. Therefore we can say that

the GFT of a signaly is ˆy=U∗y.

The Inverse Graph Fourier Transform (IGFT) is

y(n) = N−1 X `=0 ˆ y(λ`)u`(n), ∀n∈ V. (2.4)

The Fourier basis can be chosen as the eigenvectors of either the combinato-rial or the normalized graph Laplacian matrices, since both spectrums have a frequency-like interpretation [1]. We notice that, as in the classical Euclidean set-tings, the spectral domain representation provides important information about the graph signals. For example, analogously to the classical case, the graph Fourier coefficients of a smooth signal decay rapidly. Such signals are compressible as they can be closely approximated by just a sparse set of Fourier coefficients [7]. This property is used in many applications such as compression or regularization of graph signals.

The graph Fourier transform and its inverse give us a way to equivalently repre-sent a signal in two different domains: the vertex domain and the graph spectral domain. While we often start with a signal in the vertex domain, it may also be useful to define a signal directly in the graph spectral domain. We refer to such a signals as kernels. In Fig. 2.7 one such signal, a heat kernel, is shown in both domains.

(28)

0 2 4 6 8 10 12 14 16 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 e−λℓ

Spectral domain Vertex domain

Figure 2.7: Exponential kernel and its IGFT

Analogously to the classical analog case, the graph Fourier coefficients of a smooth signal such as the one shown in Fig. 2.7 decay rapidly. Such signals are compressible as they can be closely approximated by just a few Fourier coefficients. Besides its use in spectral analysis, the graph Fourier transform is also useful in generalizing traditional signal processing concepts such as convolution, trans-lation, or modulation to graph settings. In particular, the relation between the vertex and the spectral graph domain has been used to define the convolution on

the graph. Given two signals y and h, the result of the convolution of these two

signals on vertex n is defined as [8, 6]

(y∗h) (n) = N−1 X `=0 ˆ y(λ`) ˆh(λ`)u`(n), (2.5)

which imposes the property that the convolution in the vertex domain is equivalent to a multiplication in the graph spectral domain, as in the classical Euclidean settings.

The generalized convolution product defined in Eq.(2.5) satisfies the following properties, as discussed in [8]:

1. Generalized convolution in the vertex domain is multiplication in the graph spectral domain:

[

(29)

2. Let α∈R be arbitrary. Then α(f∗h) = (αf)∗h=f∗(αh). (2.7) 3. Commutativity: f ∗h =h∗f. (2.8) 4. Distributivity: f ∗(g+h) = f∗g+f ∗h. (2.9) 5. Associativity: (f ∗g)∗h=f ∗(g ∗h) (2.10) 6. Define a function h0 inRN byh0(i) := N−1 P `=0

u`(i). Then h0 is an identity for

the generalized convolution product:

f∗h0 =f. (2.11)

7. An invariance property with respect to the graph Laplacian (a difference operator):

L(f ∗h) = (Lf)∗h=f∗(Lh). (2.12)

8. The sum of the generalized convolution product of two signals is a constant times the product of the sums of the two signals:

N X i=1 (f ∗h) (i) = √Nfˆ(0) ˆh(0) = √1 N " N X n=1 f(i) # " N X n=1 h(i) # . (2.13)

The classical translation operator is defined through the change of variable

(Tvy) (t) = y(t−v), which cannot be generalized to graph settings. However, it

is possible to define a generalized translation operator Tv of a graph signal as a

convolution with a Kronecker δ centered at vertex v [9, 8, 6]:

Tvy(n) = √ N(y∗δv) (n) = √ N N−1 X `=0 ˆ y(λ`)u∗`(v)u`(n) (2.14)

(30)

where δv(n) =    1 if n=v 0 otherwise

where the normalizing constant √N ensures that the translation operator

pre-serves the mean of the signal. The Kronecker function δv is an N-dimensional

signal that is zero everywhere on the graph except from node v, where it takes

the value of one. This is a kernelization operation, acting on a signal ˆydefined in

the graph spectral domain rather than translating a signalydefined in the vertex

domain. An example of the translation of a signalyin different nodes of the graph

is illustrated in Fig. 2.8. We can observe that the classical shift in the classical definition of the translation does not apply on graphs.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 T1y 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 T10y 0.05 0.1 0.15 0.2 0.25 T30y 0.05 0.1 0.15 0.2 0.25 T50y

(31)

Some expected properties of the generalized translation operator follow imme-diately from the generalized convolution properties:

1. Ti(f∗h) = (Tif)∗h=f ∗(Tih). 2. TiTjf =TjTif. 3. N P i=1 (Tif) (i) = √ Nfˆ(0) = N P i=1 f(i).

Unlike the classical case, the set of translation operators{Ti}i∈{1,2,...,N} does not

form a mathematical group; i.e., TiTj 6= Ti+j. In the very special case of

shift-invariant graphs [10], which are graphs for which the Discrete Fourier Transform (DFT) basis vectors are graph Laplacian eigenvectors, we have

TiTj =T[((i−1)+(j−1)) modN]+1, ∀i, j ∈ {1,2, . . . , N}. (2.15)

However, Eq.(2.15) is not true in general for arbitrary graphs. Moreover, while

the idea of successive translations TiTj carries a clear meaning in the classical

case, it is not a particularly meaningful concept in the graph setting due to our definition of generalized translation as a kernelized operator.

Filtering is another fundamental operation in graph signal processing. Similarly

to classical signal processing, the outcome yout of the filtering of a graph signal y

with a graph filterh is defined in the spectral domain as the multiplication of the

graph Fourier coefficient ˆy(λ`) with the transfer function ˆh(λ`) such that ˆ

yout(λ`) = ˆy(λ`) ˆh(λ`), ∀λ` ∈σ(L). (2.16)

The filtered signalyout at noden is given by taking the IGFT of ˆyout in (2.16):

yout(n) = N−1 X `=0 ˆ y(λ`) ˆh(λ`)u`(n). (2.17)

Eq.(2.17) can be expressed in matrix notation [11] as

(32)

where ˆ h(L) = U     ˆ h(λ0) 0 . .. 0 ˆh(λN−1)     U∗

is a graph filter or kernel defined in the spectral domain of the graph.

Interestingly, when the graph filter is a polynomial of order K with coefficients

{αk}Kk=0 such that ˆ h(λ`) = K X k=0 αkλk`, (2.19)

filtering in the spectral domain of the input signal y(n) at node n can be

inter-preted as a linear combination of the components of the input signal at vertices

that are within a K-hop neighborhood of n. Combining Eqs. (2.18) and (2.19)

we obtain yout(n) = N−1 X `=0 ˆ y(λ`) ˆh(λ`)u`(n) = N X m=1 y(n) K X k=0 αk N−1 X `=0 λk`u∗`(m)u`(n) = N X m=1 y(m) K X k=0 αk Lk n,m (2.20)

where Lkn,m = 0 if the shortest-path distance between vertices n and m id

greater thank [6]. This property can be quite useful for designing signals that are

localized in the vertex domain of the graph. A detailed overview of these basic operations can be found in [1].

(33)

2.4

Applications Of Graph-Based Signal

Pro-cessing

In this section, we present some graph-based signal processing applications. We review some of the works from the literature that use graph-based tools to process graph signals in both centralized and distributed settings.

2.4.1

Processing with graph-based priors

Many of the representation methods of the previous section have been applied to different signal processing tasks such as denoising, semi-supervised learning, and classification. Similar to the traditional Euclidean domain, notions such as smoothness and sparsity have been used as regularizers for solving many inverse graph-based problems in both centralized and distributed settings.

The smoothness of the signal on the graph has been one of the core assumptions in semi-supervised learning with applications in classification, link prediction, and ranking problems. A signal is considered to be smooth on the graph if it exhibits

little variations between strongly connected vertices. Typically, the notion of

global smoothness Sp(y) of a signal y is defined through the discrete p-Dirichlet

norm of y as Sp(y) = 1 p X v∈V k∇vykp2 = 1 p X v∈V " X u∈Nv Avu[y(v)−y(u)] 2 #p2 , (2.21)

whereNv denotes the one-hop neighborhood of nodev, andAvu is the edge weight

between nodesv and u. When p= 1, Eq.(2.21) defines the total variation of the

signal y on the graph. When p= 2, we have a widely-used Laplacian based form

of smoothness defined as S2(y) = X u,v∈E Avu[y(v)−y(u)]2 =yTLy= N−1 X `=0 λ`yˆ(λ`) 2 . (2.22)

(34)

Eq.(2.22) implies that the signal y is smooth, i.e., S(y) is small, only if the graph Fourier coefficients corresponding to big eigenvalues are small. This defi-nition of smoothness or similar notions have been imposed as regularizers in the graph-based semi-supervised learning literature, where the goal is to compute the unknown signal entries by exploiting the assumption that the signal values vary slowly between nodes that are connected by strong edges. The extension to more sophisticated regularization techniques has been developed through the definition of kernels on graphs that are typically of the form of the power series of the graph Laplacian. Recently, a framework for active semi-supervised learning based on sampling theory for graph signals has been introduced and is based on the above notion of smoothness of signals on the graph.

While smoothness priors have been widely employed, the use of sparse prior for graph signals has been mostly overlooked so far. The reason is that the link between sparsity and signal structure is not well understood in graph settings. However, there are still some works that try to exploit sparsity in learning appli-cations. For example, the sparsity of the Fourier coefficients has been exploited for the reconstruction of bandlimited graph signals.

2.4.2

Distributed Processing Of Graph Signals

The processing of graph signals in centralized settings has received considerable attention, but less work has been devoted to solving similar tasks in distributed

settings like sensor networks. Many distributed processing tasks consider the

graph signal to be the result of the application of a linear graph-based operator to an initial input signal. When the signal can be represented as a filtering op-eration in the vertex domain of the graph, distributed processing of the signal is

significantly simplified. More formally, given an initial signal y, every signal yout

that can be expressed as filtering of y in the graph vertex domain with a graph

operator P ∈RN×N, such that

yout(i) =

X

j∈Ni

Pi,jy(j) (2.23)

can be computed by local exchange of information only within the neighborhood

(35)

and j. The operator or graph filter P is then defined according to the model of the signal.

Most of the existing works in such settings focus on reaching distributively an agreement between sensors, using only local communication. In that case, the

operatorP is a doubly stochastic weight matrix that leads to an outputyout that is

the average value of components of the initial signaly. Examples of such operators

are the Metropolis and the Laplacian weight matrices [12] defined respectively as:

• Metropolis weights Pij =          1/(1 + max{di, dj}) j ∈ Ni, i6=j 1− P k∈Ni Aik i=j 0 otherwise, (2.24)

whered(i), d(j) denotes the degree of the i-th and the j-th sensors respec-tively.

• Laplacian weights

P =I−αL (2.25)

whereLdenotes the Laplacian matrix of the graph Gand the scalar αmust

satisfy 0< α < 1/dmax, where dmax consists of the maximum degree of the

graph.

Among the most common applications, distributed consensus algorithms in both synchronous (average consensus algorithms) [13] and asynchronous versions (gos-sip algorithms) [14] have been widely used for performing various aggregations

tasks in ad-hoc sensor networks. In particular, the authors in [15] solve the

problem of distributed classification of multiple observations exploiting average consensus while consensus-based distributed algorithms for Support Vector Ma-chine (SVM) training for binary classification have been proposed in [16]. In addition, [17] solves a distributed field estimation problem from compressed mea-surements while [18] introduces an algorithm for distributed subspace estimation based on average consensus. Gossip algorithms find also numerous applications in problems such as distributed parameter estimation, source localization, dis-tributed compression [14], and decentralized sparse approximation [19].

(36)

Distributed average consensus operators are however only a specific case of the general family of graph-based operators. More in general, distributed processing

of graph signals requires the definition of more sophisticated graph operators P.

To that end, the authors in [20] have introduced a special category of linear graph operators called graph Fourier multipliers, which has been eventually extended to generalized graph multiplier operators in [21]. Such operators are defined with

respect to a real symmetric positive semi-definite matrix Φ = U V UT, where U

and V are the eigenvectors and the eigenvalues of Φ, and are expressed as

P = N−1

X

`=0

g(V`)U`U`∗, (2.26)

where g(·) : [0 :Vmax(Φ)] → R is a positive function defined in the spectral

domain of the graph. When the matrix Φ is the graph Laplacian matrix then

P = N−1

X

`=0

g(λ`)u`u∗`, (2.27)

which corresponds to a graph Fourier operator. The union of such operators

P = [ug1(Λ), ug2(Λ), . . . , ugS(Λ)] represents the graph Fourier multipliers. From

Eq.(2.23), a graph signalyoutis then the result of filtering a set of initial signalsy= [y1;y2;. . .;yS] in the spectral domain with each of the graph Fourier multipliers, such that yout = S X s=1 ugs(Λ)u Ty s. (2.28)

An example of a union of graph Fourier multipliers is the spectral graph wavelet transform [6], where each of the multipliers corresponds to a particular scale. An efficient way to apply graph Fourier multipliers in distributed settings is by ap-proximating them with Chebyshev polynomials [6], [20]. In that case, the output

signal yout is the linear combination of a set of graph filtering operations (in the

vertex domain) of some initial signals on the graph. Such an approximation

per-mits the distributed approximation ofyout from the set of initial signals as well as

the implementation of the forward and adjoint operators, which can be useful in tasks such as distributed denoising and distributed smoothing, as shown in [20].

(37)

the above mentioned ideas of graph filtering in the vertex domain. Recently, a distributed least square reconstruction algorithm of bandlimited graph signals has been proposed in [22]. The initial observations are sampled only on a subset of nodes and the algorithm is shown to be efficient in tracking the unobserved data of time-varying graph signals. The distributed graph signal inpainting algorithm of [23] uses a regularizer that minimizes a metric term related to the variation of the signal on the graph. The underlying assumption is that the signal is smooth on the graph. The problem of interpolation of bandlimited graph signals from a few samples is also studied in . The reconstruction is achieved using iterative graph filtering, which can be approximated by polynomials of the graph Laplacian matrix and implemented in distributed settings. Graph filters have also been used to accelerate the convergence of the average consensus algorithm on a sensor graph [24, 25]. Finally, matrix polynomials of a graph-shift operator have been proposed in [26] to design graph filters for distributed linear network operators such as finite-time consensus or analog network coding. Most of all the above mentioned works show the potentials of graph signal processing techniques for distributed tasks, but do not explicitly consider practical aspects such as quantization, which is of significant importance in real word applications.

2.4.3

Graph-Based Multimedia Processing

Apart from processing signals that live on networks, graphs have been used for modeling structured signals that live on other irregular domains. In particular, graph signal processing algorithms have been successfully applied in numerous multimedia applications in order to capture the geometrical structure of complex high-dimensional signals such as images, videos, and 3D data. This type of data provides a promising application domain for the emerging field of graph signal processing.

First, we note that graphs and features based on graphs have recently started to gain attention in the computer vision and shape analysis community mainly due to the fact that the graph Laplacian has been shown to approximate successfully the Laplace-Beltrami operator on a manifold [27], [28], [29]. Spectral features defined on the graph have been successfully applied in a wide variety of shape analysis tasks. The heat kernel signatures [30], their scale-invariant version [31],

(38)

the wave kernel signatures [32], the optimized spectral descriptors of [33], have already been used in 3D shape processing with applications in graph matching [34] or in mesh segmentation and surface alignment problems [35]. These features have been shown to be stable under small perturbations of the edge nodes of the graph. In all these works however, the descriptors are defined based only on the graph structure, and the information about the attributes of the nodes such as color and 3D positions, if any, is assumed to be introduced in the weights of the graph. Thus, the performance of these descriptors largely depends on the quality of the defined graph.

Signal compression is a second application domain where graph signal processing tools have been applied successfully. Analogously to the classical analog case, the graph Fourier coefficients of a smooth signal decay rapidly , making the graph Fourier transform a good candidate for compression. In particular, the graph Fourier transform has been widely used to compress efficiently smooth images. For example, the graph-based Fourier transform has been used in [36] for the compression of image and video signals, as an alternative to the classical Discrete Cosine Transform (DCT). The authors in [37] adapted the graph for maximally smooth signals and optimized the graph Fourier transform for better compression of 3D smooth images. A set of edge-adaptive transforms was presented as an alternative to the standard DCT and used in depth-map coding in [38]. A few steps towards the theoretical analysis of the analogy between the graph Fourier transform and the classical DCT have been taken in [39]. Under a Gaussian Markov Random Field image model, the graph Fourier transform has been shown to be optimal in decorrelating the signal and used for predictive transform coding. Graphs have also been used for compressing multiview images, where the graph is designed by connecting corresponding pixels in different views [40]. In [41] graph-based transforms have been used to code luminance values in RGB. The problem of multiview images of asymmetric quality has been studied in [42], where the construction of a graph from high quality images has led to the enhancement of low quality images. In the same line of works, a graph regularizer that imposes smoothness has been proposed in [43] to enhance the quality of quantized depth images. Thus, graph representations are an interesting tool for compression of image and video signals.

(39)

graph-ics where the structural organization of 3D objects is captured by a graph. In particular, the authors in [44] represent a moving human body by a sequence of 3D meshes with a fixed and known connectivity represented by a graph. The geometry and the color information have then been considered as time-varying signals on a graph, which are compressed using the graph wavelet filter banks . Graph representations have been also used in to model the structure of 3D point clouds and connect nearby points. The graph Fourier transform, which is equiv-alent to Karhunen-Love transform on such graphs, is adopted to decorrelate and eventually compress the point cloud attributes that are treated as signals on the graph.

(40)
(41)

3

Graph Signal Reconstruction

3.1

Reconstruction Problem

We consider the problem of reconstruct a graph signal from observations taken from a subset of vertices of the graph [45]. The problem fits well, e.g., to a Wireless Sensor Network (WSN) scenario, where the nodes are observing a spatial field related to some physical parameter of interest. Let us assume that the nodes’ topology is fixed and that the corresponding graph is symmetric and connected. Suppose now that the WSN is equipped with nodes that, at every time instant, can take observations of the underlying signal or not, depending on, e.g., energy constraints, failures, limited memory and/or processing capabilities, etc. Our purpose is to build a technique that allows the recovery of the field values at each node. In this way, the information is processed on the fly by all nodes and the data diffuse across the network by means of a real-time sharing mechanism.

The signal corresponds to the minimum temperatures of 99 Italian cities, as-signed to each month of the year. The temperatures data set were taken from the “Ministero delle politiche agricole, alimentari e forestali” site [46], where each value of the temperature is specified for the last 12 months. Furthermore, the coordinates of latitude and longitude were taken from the same source.

(42)

city ind lat long Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Torino 1 45.1 7.7 -4.8 -4.3 -3 1.4 3.9 8.7 11.3 11.1 9.4 3 0.6 -1.9 Alessandria 2 44.9 8.6 6.4 4.6 1.3 3.5 4.9 9.8 11.6 16.3 18.6 18 15.6 9.1 Asti 3 44.9 8.2 6.3 3.4 0.8 2.7 4.6 9.5 11.6 16.3 18.5 18.1 15.9 9.1 Biella 4 45.6 8.1 -6.5 -4.6 -3.4 1.8 4.6 9.7 12 11.3 9.1 3.1 -0.8 2.9 Cuneo 5 44.4 7.5 -0.8 -0.1 1.7 6.2 8.2 12.8 15.4 15.1 13.4 6.9 4.7 1.9 Novara 6 45.1 8.6 -3.4 -0.5 1.2 6.6 9.2 14.2 16.4 15.5 13.1 6.9 1.8 -0.7 Verbania 7 45.9 8.5 -8.5 -7 -6.1 -1.6 1.4 6.5 9 8.6 6.8 1.1 -2.8 -4.1 Vercelli 8 45.3 8.3 -5.7 -3.8 -2.6 2.5 5.4 10.3 12.6 12 9.7 3.6 -0.2 -2.1 Aosta 9 45.7 7.3 -9.6 -9.7 -9.4 -5.3 -2.5 2.2 4.9 5 3.4 -2.2 -4.6 -5.4 Milano 10 45.5 9.2 -1.3 2.1 3.6 8.8 11 15.9 18.5 17.4 15.3 8.8 3.8 1 Bergamo 11 45.7 9.7 -2.7 0 1.2 5.8 7.9 12.7 15.7 14.6 13.2 6.5 2.9 -0.3 Brescia 12 45.5 10.2 -2.6 -0.1 0.8 5.2 7.6 12.4 15.4 14 12.6 6 2.9 0.1 Como 13 45.8 9.1 -3.8 -1.2 0.3 5.2 7.4 12 14.9 14.1 12.6 6.3 1.9 -0.9 Cremona 14 45.1 10 -0.6 2.9 3.7 8.8 11.8 16.3 19.4 17.6 15.5 8.9 4.5 1.8 Lecco 15 45.8 9.4 -2.3 0.6 2.1 7.2 9.1 13.8 16.6 15.7 14.1 7.6 3.2 0.1 Lodi 16 45.3 9.5 -0.4 3 3.8 8.8 11.7 16.5 19.2 17.7 15.6 8.8 4.8 2 Mantova 17 45.1 10.8 -0.7 3.1 3.8 8.6 11.9 15.9 19.6 17.4 15.2 9.1 4.5 1.8 Pavia 18 45.2 9.2 -0.5 2.4 3.5 8.5 10.7 15.5 18 16.8 14.4 8.1 4.6 2.3 Sondrio 19 46.2 9.9 -6.4 -4.6 -3.6 0.2 2.4 6.8 10.2 9.5 8.5 1.9 0 -2.2 Varese 20 45.8 8.8 -3.6 -0.7 0.9 6.2 8.7 13.5 16 15.2 13.3 7.2 1.8 -0.9 Trento 21 46.1 11.1 -4.9 -3.2 -2.5 1.2 3.8 8.6 11.3 10.5 9 3 1.3 -1.1 Bolzano 22 46.5 11.3 -6.2 -4.1 -3.6 0.5 2.9 7.5 10 9.1 7.8 1.6 0 -1.9 Venezia 23 45.4 12.3 0.6 4.9 5.8 9.8 12.6 17.4 20.2 18 16.6 10.2 5.4 2.3 Belluno 24 46.2 12.2 -4.2 -1.3 -0.7 3.4 5.9 10.8 13 11.7 10.5 4.2 1.3 -1.2 Padova 25 45.4 11.9 -0.8 3.9 4.9 8.6 11.9 16.5 19.3 17.2 15.4 9.4 4.4 0.9 Rovigo 26 45.1 11.8 0.1 4.3 5.2 9 12.4 16.8 19.7 17.4 15.4 9.6 5.2 2.2 Treviso 27 45.7 12.2 -1.1 2.8 4 7.9 10.7 15.4 18.1 16.3 14.9 8.6 4.1 0.9 Verona 28 45.4 11 -1 2.6 3.5 7.7 10.8 15.1 18.3 16.5 14.6 8.5 4.2 0.9 Vicenza 29 45.5 11.5 -1.8 1.8 2.8 6.2 9.4 14.1 16.6 15.3 13.4 7.7 3.5 0 Trieste 30 45.6 13.8 2.1 6.1 6.7 10 12.4 17.6 20.1 18.9 17.4 10.3 6.3 4.1 Udine 31 46 13.2 -2.5 1.6 2.1 6 8.5 13.6 15.4 13.7 12.6 6.2 2.3 -0.3 Gorizia 32 45.9 13.6 0.5 4.9 5.2 8.4 11.2 16.4 18.2 17.1 15.6 9.2 5 2.6 Pordenone 33 46 12.6 -2 2.5 3.5 7.7 10.2 15.1 17.5 15.5 14.4 7.6 3 0.2

(43)

city ind lat long Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Genova 34 44.4 12.6 8.7 7.6 4.1 5.2 5.9 10.1 11.5 15.6 18.2 18 16.3 10.6 Imperia 35 43.9 8.9 3.2 3.4 5.2 9 10.9 15 17.9 18.2 16.7 10.7 7.9 5.9 La Spezia 36 44.1 8 5.1 5.8 6 9.6 11.3 15 17.7 17.6 16.1 11.1 9.4 8.3 Savona 37 44.3 9.8 3.2 4.5 5.9 10.1 11.6 16 18.5 18.7 16.8 10.7 8.2 6.8 Bologna 38 44.5 8.5 -0.7 2.7 3.1 7.1 10 14.1 17.8 15.7 13.6 7.9 4.3 1.7 Ferrara 39 44.8 11.3 -0.1 4 5 8.8 12.4 16.6 19.7 17.1 15.1 9.3 5.4 2.3 Forl 40 44.2 11.6 0.7 3.8 4.1 7.5 10.4 14.4 17.5 15.7 13.8 8.7 5.2 1.6 Modena 41 44.6 12 -1.4 1.4 1.3 5.6 8.1 12.2 16.2 14.3 11.8 6.5 3.5 1.8 Parma 42 44.8 10.9 -0.2 2.1 2.1 6.7 9.2 13.4 16.7 15 13.1 7.3 4.6 2.9 Piacenza 43 45 10.3 -0.3 2.2 2.5 7.2 9.7 14.4 17.1 15.6 13.6 7.4 4.6 2.5 Ravenna 44 44.4 9.7 0.8 4.1 4.7 8.3 11.3 15.4 18.4 16.3 14.5 9 5.7 2.4 Reggio Emilia 45 44.7 12.2 -1.3 1.6 1.4 6.1 8.8 12.7 16.7 14.5 12.3 6.8 3.7 2.1 Rimini 46 44.1 10.6 1.8 4.7 5.2 8.9 11.5 16 19.4 17.6 15.2 10.2 7.1 3.1 Firenze 47 43.8 12.6 1.6 4.1 3.9 7.1 9.9 13.9 16.9 16.4 14.1 9 5.6 2.8 Arezzo 48 43.5 11.9 1.6 4 3.3 6.6 9.4 13.4 16.4 15.4 13.1 8.8 4.7 1.9 Grosseto 49 42.8 11.9 4.7 5.7 5.4 9.3 11.4 15.9 18.9 18.4 16.1 11.7 8.4 6.2 Livorno 50 43.6 11.1 6.5 7.2 7 10.3 12.4 16.7 19.6 19.5 17.5 13.1 10.2 8.6 Lucca 51 43.8 10.3 1.6 2.7 2.2 6.1 7.9 12 15.1 14.3 12.1 7.6 5.9 4.8 Massa 52 44 10.5 2.2 3.3 3 7 8.8 12.7 15.6 14.7 13.1 8.1 6.6 5.6 Pisa 53 43.7 10.1 4.8 6.1 6 9.1 11.6 15.8 18.3 18.2 16.1 11.4 8.7 6.6 Pistoia 54 43.9 10.4 0.9 2.6 2.3 5.8 8 12.2 15.2 14.9 12.5 7.6 5.2 3.4 Prato 55 43.9 10.9 0.7 3 2.8 6.1 8.6 12.8 16 15.7 13.3 7.9 5 2.7 Siena 56 43.3 11.1 3.4 4.8 4.4 8.2 10.4 14.9 17.8 17.3 14.9 10.4 6.9 4.6 Perugia 57 43.1 11.3 1.4 3.8 2.7 7 8.6 13.1 16.7 15.2 12.7 8.4 5.1 2.6 Terni 58 42.6 12.4 1.6 4 2.8 7.4 8.7 13.4 17.2 15.9 13.3 9.1 5.7 3 Ancora 59 43.6 12.7 4.6 6.5 6.3 10.3 12.2 16.8 20.6 18.7 16.2 11.2 8.5 5.5 Ascoli Piceno 60 42.8 13.6 1.1 3.2 2.8 6.6 8.3 13.3 17.2 15.2 12.7 8.4 5.3 2.6 Macerata 61 43.3 13.4 3.2 5.3 4.7 8.8 10.4 15.1 19.1 17 14.6 9.9 7.2 4.4 Pesaro 62 43.9 12.9 3 5.3 5.3 8.9 11.5 15.9 19.2 17.5 15.3 10.3 7.2 3.8 Roma 63 41.9 12.4 3.7 6.3 5.2 9 11 15.1 18.5 18 15.4 11.5 7 3.2 Frosinone 64 41.7 13.4 0.2 3.3 2 6.4 8.2 12.7 16.2 15.1 12.6 8.8 3.7 -0.9 Latina 65 41.5 12.9 5.8 7.9 6.8 10.4 12.5 16.6 19.6 19.3 16.9 13.3 9.2 5.4 Rieti 66 42.4 12.9 -0.9 2.1 0.5 4.8 6.3 10.9 14.9 13.4 10.7 6.8 3.1 0.3

(44)

city ind lat long Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Viterbo 67 42.4 12.1 4.1 6.1 5.4 9.6 11.1 15.6 19 18.3 15.7 11.6 7.9 4.9 L’Aquila 68 42.2 13.2 -1.9 0.9 -0.4 4 5.7 10.3 14.2 12.6 10 6.2 1.9 -1.4 Chieti 69 42.2 14.1 3.2 4.8 4.6 9 11 15.8 19.5 18.1 15.4 11.5 7.1 3.9 Pescara 70 42.5 14.2 1 2.6 2.2 6.2 8.3 13.2 17 15.2 12.7 8.7 4.8 1.9 Teramo 71 42.7 13.7 1.3 2.9 2.7 6.5 8.5 13.5 17.4 15.5 13 8.8 5.3 2.8 Campobasso 72 41.6 14.7 3.5 6.1 5.5 10.2 11.6 16.3 19.8 18.6 15.8 12.1 7.3 4.1 Isernia 73 41.6 14.3 0.5 3.5 2.7 7.7 8.9 13.8 17.3 16 13.3 9.8 4.1 -0.2 Napoli 74 40.9 14.2 7.7 9 8.2 12 13.7 18.4 21 20.8 18 14.7 10.8 8 Avellino 75 40.9 14.8 3.4 5.4 4.6 8.8 10.2 14.8 18 16.9 14.1 10.9 7.1 3.7 Benevento 76 41.1 14.8 2.5 5.1 4.3 8.9 10.1 14.9 18.1 17.1 14.3 10.8 5.9 2.5 Caserta 77 41.1 14.3 2.5 5.3 4.3 8.8 10.4 15.4 18.4 17.5 14.9 11.3 5.8 1.2 Salerno 78 40.7 14.8 5 6.9 5.9 10.1 11.4 16 19.1 18.3 15.6 12.6 8.8 5.5 Bari 79 41.2 16.9 4.1 6.5 6.1 9.6 11.5 16.4 19.6 18.4 15.4 12.2 8 4.4 Brindisi 80 40.6 17.8 5.6 7.8 7.7 11 13 18.2 21 20.2 17 14.1 9.5 6 Foggia 81 41.5 15.5 4.7 6.8 6.1 10.4 11.9 16.4 20.3 18.9 15.9 12.4 9.1 6.3 Lecce 82 40.4 18.2 6.6 9.2 8.4 11.9 13.9 19 22.1 21.4 18.3 15.4 10.9 7.3 Taranto 83 40.5 17.2 5.6 8 7.8 10.9 12.7 17.9 20.5 19.7 16.7 14.1 9.1 5.5 Potenza 84 40.6 15.8 3.5 5.6 4.7 8.8 10.4 15.1 18.5 17.1 14.4 11.5 7.8 4.2 Matera 85 40.7 16.6 4.1 6.2 5.6 9 11.1 16 19.2 18.1 15.3 12.3 7.8 3.5 Catanzaro 86 38.9 16.6 7.1 8.8 7 11.1 13.2 18.1 21.2 20.3 17.3 15.6 12 7.6 Cosenza 87 39.9 16.3 4.8 6.8 5.5 9.9 11.3 16.2 19.4 18.5 15.5 13.4 9.8 6 Crotone 88 39.1 17.2 6.8 8.2 7 10.8 12.5 17.9 21.2 20.4 17.2 15 11.5 7.4 Reggio Calabria 89 38.1 15.7 10.2 11.7 10.1 13.8 15.9 20.6 23.8 23.4 21.1 19.3 14.8 11.1 Vibo Valentia 90 38.7 16.1 10.4 12.1 10.4 14 16.2 20.8 23.8 23.2 20.7 18.9 14.9 11.3 Palermo 91 38.1 13.4 7.6 8.4 7.6 11.8 13.1 17.3 20.8 20.3 18.1 16.4 11.5 8.3 Agrigento 92 37.3 13.6 7.8 9 7.8 12.4 13.7 17.8 20.3 20.2 18.4 17.1 12.1 8.3 Caltanisetta 93 37.5 14 6.2 7.9 6.5 11.4 12.8 17.2 20.2 19.5 17.4 16 11.1 7.5 Catania 94 37.5 15.1 6.3 8.2 6.7 11 13.1 17.6 20.8 20.5 18.6 16.9 11.6 7.7 Enna 95 37.5 14.3 4.9 6.7 5.2 9.9 11.6 16.2 19.7 18.9 16.5 14.8 10 6.4 Messina 96 38.2 15.5 10.4 11.7 10.5 14.1 15.7 20 23.4 23.1 21.3 19.3 14.9 12.1 Ragusa 97 36.9 14.8 8 9.7 8.4 12.6 14.6 18.9 21.6 21.4 19.8 18.3 13.2 9.6 Siracusa 98 37.1 15.3 8.4 9.9 8.8 12.6 14.9 19.2 22.2 22.2 20.5 19 13.6 9.7 Trapani 99 38 12.5 9.9 10.1 9.5 12.7 14.4 18.4 21.2 21.5 19.9 17.8 13.3 9.7

(45)

We also assume that the sampling procedure in the vertex domain is random, therefore we are able to detect the signal only for a subset of nodes: for the non-sampled nodes we have to reconstruct their own value of the signal.

Assume that we have the time-sampled signal rt, t ∈ τ where τ denotes the

set of sampled nodes. The signal rt is known and we want to reconstruct the

entire signal st for t ∈ τ ∪τ¯, where ¯τ is the set of non-sampled nodes, starting

from an estimate in the frequency domain limited to the component indexed by

F. We assume an optimal frequency ordering for the setF, which means that the

frequencies are indexed in such a way that the first provide higher signal energy and the last does not represent a significant contribution: therefore the frequency

set F is ordered according to each nodes energy contribution. This assumption

provides stability for the estimation process and ensures that the reconstruction error is lower than the one obtained assuming random ordering (or some other

indexing strategy). Then we can say thatF acts as a low-pass filter, which passes

the lowest frequencies guaranteeing higher signal energy.

-5 0 5 10 15 20 February -5 0 5 10 15 20 August

Figure 3.1: Temperature for 2 different months

The problem corresponds to reconstruct the non-sampled component st, t ∈ τ¯

of the signal    ˆ st=rt for t∈τ ˆ st=Ut,fSˆf for t∈τ¯ (3.1)

(46)

Ut,f is the eigenvector matrix associated to rowst∈τ¯and columnsf ∈ F, then we need to know the component sampled in the vertex domain only for the subset

of eigenvector chosen by frequency index. ˆSf, f ∈ F is the estimated signal in the

frequency domain.

The Least Square (LS) solution to the problem (3.1) is 1. ˆSf = arg min

Sf∈R|F |

krt−Ut,fSfk22, for t∈τ and f ∈ F

2. ˆst =Ut,fSˆf, fort∈τ¯ and f ∈ F

Step 2 is easy, because Ut,f is known and ˆSf is what we have to estimate (the

GFT of the observed signal).

Assuming for clearness Ut,f =UtT, step 1 can be solved as follow:

ˆ Sf = arg min Sf X t∈τ (rt−Ut,fSf) 2 = arg min x∈R|F | X t∈τ rt−UtTx 2 = arg min x X t∈τ rt2+xτ(UtUtτ)x−2rtUtTx = arg min x X t∈τ rt2+xτ X t∈τ UtUtT ! x−2 X t∈τ rtUtT ! x (3.2)

and taking the derivative of Eq.(3.2) with respect to x and setting it equal to 0

we have X t∈τ UtUtT ! x=X t∈τ Utrt (3.3) which leads to ˆ x= X t∈τ UtUtT !−1 · X t∈τ Utrt ! (3.4) and finally ˆ Sf = X t∈τ Ut,fT Ut,f !−1 · X t∈τ Ut,fT rt ! . (3.5)

Now we discuss a necessary and sufficient condition that guarantees exactly signal reconstruction from its samples, as discussed in [45]. It is clear from Eq.(3.5)

(47)

that reconstruction of the original signal is possible only if the matrix

X

t∈τ

Ut,fT Ut,f (3.6)

is invertible. From (3.6), a necessary condition enabling reconstruction is

|τ| ≥ |F | (3.7)

i.e., the number of nodes in the sampling set must be greater than equal to the signal bandwidth. However, this condition is not sufficient, because matrix

P

t∈τ

Ut,fT Ut,f in (4.33) may loose rank, or easily become ill-conditioned, depending

on the graph topology and sampling strategy. It is invertible if P

t∈τ

Ut,fT Ut,f =

UfTRτUf has full rank, where Rτ is the vertex-limiting operator that projects

into the sampling setτ. Introducing the operator

Rτ¯ =I−Rτ, (3.8)

which projects into the complement of the sampling set. Then, exploiting (4.34) inUfTRτUf, signal reconstruction is possible if I−UfTR¯τUf is invertible, i.e., if condition

kRτ¯Ufk2 <1 (3.9)

is satisfied. Condition (4.35) is related to the localization properties of graph

signals: it implies that there are no F-bandlimited signals that are perfectly

localized over the set ¯τ. As explained in [45], [47], [48], it is easy to show that

condition (4.35) is necessary and sufficient for signal reconstruction.

Eq.(3.5) can be written as ˆSf = M rt, where M = Ut,fT Ut,f

−1

t,f is the matrix projecting the known signal defined in the vertex domain into the estimated spectral domain signal. This procedure need to collect all the values of the term

P

t∈τ

UT

t,frt for each t ∈ τ in a central node and to sum them. Therefore this

solution can be seen as a centralized implementation; in chapter 4 “Distributed Algorithms” we will see a distributed solution to the reconstruction problem.

(48)

3.2

Random Sampling and Frequency

Order-ing

Now we want to briefly describe how the choice of sampling strategy and frequency ordering affects the reconstruction algorithm.

First of all, we decided to sample the nodes in the vertex domain according to a random procedure: given the number of sampled nodes, we select the samples over all the nodes according to a uniform distribution, where each node has the same probability of being chosen. This is justified because we can not decide under which conditions a node is selected or not: in some cases we are aware of a node function value, in some other we are not, depending on, e.g., energy con-straints, failures, limited memory and/or processing capabilities, etc. Therefore we randomly pick up a sufficient number of nodes over the entire network, and then we build the sampling set. In Fig. 3.2 is reported an example of random sampling of graph signal.

Original signal Sampled signal

Figure 3.2: An example of signal sampling

Now we consider the choice of the frequency indexing in the graph spectral domain. We assumed that they are sorted according to the energy contribution which gives to the signal: we sorted the frequency indices in the spectral domain according to the mean of the squared absolute value of the same component, for a collection of signals, where first (the lowest) frequency gives an higher contribution

(49)

to signal energy in vertex domain, and last (the biggest) gives a small contribution to signal energy. Therefore the set of frequencies is ordered in such a way as to have the energy of the signal decreasing as the frequencies increases. An example is reported in Fig 3.3. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 λℓ -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 S ( λℓ )

Original frequency ordering

0 0.2 0.4 0.6 0.8 1 1.2 1.4 λℓ -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 S ( λℓ )

Induced frequency ordering

(50)

3.3

Performance of the LS Reconstruction

Algorithm

Now we present some results obtained through the implementation of the algo-rithm described in Section 3.1. For this simulation we took into account some

dif-ferent definitions of the weighted adjacency matrixAfrom [12], [49]. The

weight-ing methods we considered for our simulations are Metropolis-Hastweight-ing, Laplacian, maximum degree, exponential of the distance, unweighted and normalized un-weighted. We briefly describe how they are defined.

• Metropolis weights: these are a form of local-degree weights, which are

defined as Aij =          1/(1 + max{di, dj}) j ∈ Ni, i6=j 1− P k∈Ni Aik i=j 0 otherwise. (3.10)

In other words, the Metropolis weight on each edge is one over one plus the

larger degree at its two indicent nodes, and the self-weights Aii are chosen

in such a way that the sum of the weights at each node is 1. The Metropolis weights are very simple to compute and are well suited for distributed im-plementation using local information. In particular, each node only needs to know the degrees of its neighbors to determine the weights on its adjacent edges. The nodes do not need any global knowledge of the communication

graph, or even the number of nodes N. Furthermore, the weighted

adja-cency matrix A, in this case, is a doubly stochastic matrix: A·1 =1 and

1T ·A=1T.

• Laplacian weights: the weight matrix has entries given by

Aij =          α j ∈ Ni, i6=j 1−α|Ni| i=j 0 otherwise. (3.11)

(51)

where| · | denotes cardinality, or expressed in matrix form

A=I −αL (3.12)

where L is the Laplacian matrix of the associated underlying graph. The

parameterα must satisfyα ≤1/max{dk}. Even in this case,Ais a doubly

stochastic matrix.

• Maximum-degree weights: are defined as

Aij =          1/maxk{dk} j ∈ Ni, i6=j 1−di/maxk{dk} i=j 0 otherwise. (3.13)

• Exponential of the distance: the weights are function of the exponential of

the distance, that is the weight is large when two neighboring nodes are closer and is low when they are distant:

Aij =          e−dist(i,j) j ∈ Ni, i6=j 1 i=j 0 otherwise. (3.14)

where the distance between two nodes is defined asdist(i, j) = kcoordi−coordjk2,

wherecoordi corresponds to latitude and longitude coordinates of node i.

• Unweighted: the weights matrix is simple and it contains a 1 only for

neigh-boring nodes: Aij =    1 j ∈ Ni, i6=j 0 otherwise. (3.15)

• Normalized unweighted: weights are defined in such a way that A·1 =1,

where we first take the unweighted adjacency matrix (3.15) and then we normalize each row by dividing each element by its corresponding row-sum. In this case the adjacency matrix is row-stochastic.

(52)

The performance measure taken is the Mean Squared Error (MSE) between the non-sampled original signal and the reconstructed one, summed for each month:

M SE=

s X

t∈τ¯

|sˆt−st|2 (3.16)

Considering the sampling strategies, we averaged the simulation over 1000 differ-ent random sampling layout, in order to smooth the reconstruction error and to reduce the effect of the randomness of the sampling. The frequency set is ordered as described in Section 3.1.

In Fig. 3.4 we show the behaviour of the MSE for different numbers of samples and frequencies, changing the weighting strategy. We can clearly see that the best results are obtained for the doubly-stochastic Metropolis weights.

The results of the simulation can be better interpreted in Fig. 3.5, which shows the behaviour of the error by fixing the number of frequencies: we can note that the MSE for Metropolis weights is more smoothed and has a more regular trend.

(53)

Metropolis 0 # of samples 0 50 10 100 20 80 MSE [dB] 30 # of frequencies 60 40 40 50 20 0 100 Laplacian 0 # of samples 0 50 10 100 20 80 MSE [dB] 30 # of frequencies 60 40 40 50 20 0 100 Max-degree 0 # of samples 0 50 10 100 20 80 MSE [dB] 30 # of frequencies 60 40 40 50 20 0 100 e−dist 0 # of samples 0 50 10 100 20 80 MSE [dB] 30 # of frequencies 60 40 40 50 20 0 100 Unweighted 0 # of samples 0 50 10 100 20 80 MSE [dB] 30 # of frequencies 60 40 40 50 20 0 100 Normalized unweighted 0 # of samples 0 50 10 100 20 80 MSE [dB] 30 # of frequencies 60 40 40 50 20 0 100

(54)

Metropolis 0 20 40 60 80 100 # of samples 0 5 10 15 20 25 30 35 40 45 50 MSE [dB] Laplacian 0 20 40 60 80 100 # of samples 0 5 10 15 20 25 30 35 40 45 50 MSE [dB] Max-degree 0 20 40 60 80 100 # of samples 0 5 10 15 20 25 30 35 40 45 50 MSE [dB] e−dist 0 20 40 60 80 100 # of samples 0 5 10 15 20 25 30 35 40 45 50 MSE [dB] Unweighted 0 20 40 60 80 100 # of samples 0 5 10 15 20 25 30 35 40 45 50 MSE [dB] Normalized unweighted 0 10 20 30 40 50 60 70 80 90 100 # of samples 0 5 10 15 20 25 30 35 40 45 50 MSE [dB]

(55)

In Fig 3.6 is reported an example of signal reconstruction, using 50 original sampled values and 10 frequencies: it is possible to observe that the reconstructed signal is quite similar to the original one, except for some outlier values that can not be exactly estimated since their neighboring nodes does not bring sufficient information to precisely evaluate an outlying value.

-5 0 5 10 15 20 Original signal -5 0 5 10 15 20 Reconstructed signal

(56)

3.4

`

1

Regularization Sparsity

In this section we want to deal with a possible problem: what if the number of sampled nodes is smaller than the number of active frequencies? Typically, the estimation model can be represented using matrix notation as

r =U S, (3.17)

where r is the observed vector in vertex domain, U is the T ×F matrix of

eigenvectors (where F denotes the number of frequencies and T the number of

sampled nodes) andS is the vector of unknown parameters to be estimated. The

estimation problem, as we have seen, is usually solved through LS where the parameters are estimated by the values minimizing the residual sum of squares

kr−U Sk2. Provided U is full rank, such that UTU is nonsingular and can be

inverted, this gives ˆS = UTU−1

UTr.

From a statistician’s point of view, high-dimensional problems, that is when

F T, are interesting because they cannot be solved by classical estimation

procedures like LS. The standard procedures rely on the assumption that UTU

is nonsingular, otherwise UTU cannot be inverted and the parameters cannot be

uniquely estimated. This obviously does not hold when F > T, as the covariate

matrix does not have full column rank. There are no other differences in the model

than the fact thatF > T, but this highly influences the estimation problem. Thus

to cope with regression when F T, some kind of preselection or regularization

is needed.

The Least Absolute Shrinkage and Selection Operator (LASSO) was proposed by Tibshirani in 1996 [50] as a new method for estimation in linear models. In-spired by the work of Breiman [51] on the non-negative garotte and wishing to im-prove upon unsatisfactory properties of the ordinary LS estimates, he introduced

regression with a `1-norm penalty. The `1 penalty appeared to have desirable

properties that could be exploited with great benefit in high-dimensional

regres-sion problems, and it is in the F T problems that the LASSO-type methods

have really proven their superiority compared to other existing methods. Today, the methods of the LASSO-type are by far the most popular group of methods

(57)

pointing especially to why it has become such an appreciated tool for regression. Assuming the linear model (3.1), the LASSO estimator is defined by

ˆ

S = arg min

S

kr−U Sk22 +λkSk1 (3.18)

where λ is a tuning parameter controlling the amount of shrinkage. This

formu-lation of the problem is called Lagrangian form. We call the penalty of this form

a `1 penalty. In addition to shrinking the coefficients toward zero, the`1 penalty

has the advantageous property of doing variable selection. In this way the LASSO performs a kind of continuous subset selection [50].

To understand in more detail how the LASSO leads some regression coefficients to be exactly equal to zero, note first that problem (3.18) is equivalent to

mini-mizing the residual sum of squares with a size constraint of the formkSk1 ≤ton

the parameters. Here t is a tuning parameter that, by Lagrangian duality, has a

one-to-one correspondence with the penalty parameterλ.

For all penalized regression methods having similar size constraints, like also for ridge regression [52] (where the size constraint minimizes the residual sum of

squares askSk21 ≤t),tcontrols the amount of shrinkage imposed on the estimates.

By the form of the size constraintkSkr1 ≤t, larger values ofλ correspond to more

shrinkage, forcing the estimates toward zero. For the LASSO, large values of

λ will shrink all coefficients, but in addition put some of them exactly equal to

zero. This is a direct consequence of using the `1-norm in the constraint. Since

the LASSO constraint is not differentiable at zero, the LASSO has the ability of producing estimates that are exactly equal to zero. The ridge constraint, on the

other hand, does not share this property as havingr >1 gives constraints that are

differentiable at zero [53], [54]. That is, the difference really lies in the shape of the constraint region. To illustrate this, we consider the simple situation with only two parameters in Fig.3.7. It shows the estimation picture for the LASSO and ridge regression. The elliptical contour lines represent the residual sum of squares centered at the LS estimate, while the shaded regions represent the constraint region for the lasso and ridge regression respectively.

References

Related documents