• No results found

Sparse Signal Recovery: Theory, Applications and Algorithms

N/A
N/A
Protected

Academic year: 2021

Share "Sparse Signal Recovery: Theory, Applications and Algorithms"

Copied!
45
0
0

Loading.... (view fulltext now)

Full text

(1)

Sparse Signal Recovery: Theory, Applications

and Algorithms

Bhaskar Rao

Department of Electrical and Computer Engineering

University of California, San Diego

Collaborators: I. Gorodonitsky, S. Cotter, D. Wipf, J. Palmer, K.

Kreutz-Delgado

(2)

Outline

1

Talk Objective

2

Sparse Signal Recovery Problem

3

Applications

4

Computational Algorithms

5

Performance Evaluation

(3)

Outline

1

Talk Objective

2

Sparse Signal Recovery Problem

3

Applications

4

Computational Algorithms

5

Performance Evaluation

(4)

Importance of Problem

Organized with Prof. Bresler a Special Session at the 1998 IEEE

International Conference on Acoustics, Speech and Signal Processing

VOLUME

I11

DSPl7:

DSP EDUCATION

DSP Instead of Circuits?

-

Transition to Undergraduate DSP Education at Rose-Hulman

...

111-1845

W. Padgett, M. Yoder (Rose-Hulman Institute of Tecnology, USA)

A Java Signal Analysis Tool for Signal Processing Experiments

...

~...,..,....,.,...,..,...,,...III-l~9 A. Clausen, A. Spanias, A. Xavier, M. Tampi (Arizona State University, USA)

Visualization of Signal Processing Concepts

...

111-1853

J. Shaffer, J. Hamaker, J. Picone (Mississippi State University, USA)

An Experimental Architecture for Interactive Web-based DSP Education

...

111-1 857

M. Rahkila, M. Karjalainen (Helsinki University of Technology, Finland)

SPEC-DSP:

SIGNAL PROCESSING WITH SPARSENESS

CONSTRAINT

Signal Processing with the Sparseness Constraint

...

111-1861

B. Rao (University of California, San Diego, USA)

Application of Basis Pursuit in Spectrum Estimation

...

111-1865

S. Chen (IBM, USA); D. Donoho (Stanford University, USA)

Parsimony and Wavelet Method for Denoising

...

111-1869

H. Krim (MIT, USA); J. Pesquet (University Paris Sud, France); I. Schick (GTE Ynternetworking and Harvard Univ., USA)

Parsimonious Side Propagation

...

111-1873

P. Bradley, 0. Mangasarian (University of Wisconsin-Madison, USA)

Fast Optimal and Suboptimal Algorithms for Sparse Solutions to Linear Inverse Problems ... 111-1877

G. Harikumar (Tellabs Research, USA); C. Couvreur, Y. Bresler (University of Illinois, Urbana-Champaign. USA)

Measures and Algorithms for Best Basis Selection

...

111-1881

K. Kreutz-Delgado, B. Rao (University of Califomia, San Diego, USA)

Sparse Inverse Solution Methods for Signal and Image Processing Applications

...

111-1885

B. Jeffs (Brigham Young University, USA)

Image Denoising Using Multiple Compaction Domains

...

111-1889

P. Ishwar, K. Ratakonda, P. Moulin, N. Ahuja (University of Illinois, Urbana-Champaign, USA)

SPEC-EDUCATION:

FUTURE

DIRECTIONS

IN

SIGNAL

PROCESSING

EDUCATION

A Different First Course in Electrical Engineering

...

111-1893

D. Johnson, J. Wise, Jr. (Rice University, USA)

Integrating Engineering Design, Signal Processing, and Community Service in the EPICS Program

...

111-1897 L. Jamieson, E. Coyle, M. Haver, E. Delp (Purdue University, USA)

An Interactive Learning Environment for Adaptive Systems

...

III-1901 J. Principe, N. Euliano (University of Florida, USA); C. Lefebvre (Neurodimension, USA)

Interactive DSP Education Using Java

...

111-1905

Y. Cheneval, L Balmelli, P. Prandoni (Swiss Federal Institute of Technology, Switzerland); J. Kovacevic (Bell Labs, Lucent Technologies, USA); M. Vetterli (Swiss Federal Institute of Technology, Switzerland)

E. Patterson (Clemson University, USA); D. Wu (Sony Corporation, USA); J. Gowdy (Clemson University, USA)

Multi-Platform CBI Tools Using Linux and Java-Based Solutions for Distance Learning

...

111-1909

xxxv

(5)

Talk Goals

Sparse Signal Recovery is an interesting area with many

potential applications

Tools developed for solving the Sparse Signal Recovery problem

are useful for signal processing practitioners to know

(6)

Outline

1

Talk Objective

2

Sparse Signal Recovery Problem

3

Applications

4

Computational Algorithms

5

Performance Evaluation

(7)

Problem Description

t

is

N

×

1 measurement vector

Φ is

N

×

M

Dictionary matrix.

M

>>

N

.

w

is

M

×

1 desired vector which is sparse with

K

non-zero entries

is the additive noise modeled as additive white Gaussian

(8)

Problem Statement

Noise Free Case

: Given a target signal

t

and a dictionary Φ, find

the weights

w

that solve:

min

w

M

i

=1

I

(

w

i

= 0) such that

t

= Φ

w

where

I

(.) is the indicator function

Noisy Case

: Given a target signal

t

and a dictionary Φ, find the

weights

w

that solve:

min

w

M

i

=1

I

(

w

i

= 0) such that

t

Φ

w

2

2

β

(9)

Complexity

Search over all possible subsets, which would mean a search over

a total of (

M

C

K

) subsets. Problem NP hard. With

M

= 30,

N

= 20,

and

K

= 10 there are 3

×

10

7

subsets (Very

Complex)

A branch and bound algorithm can be used to find the optimal

solution. The space of subsets searched is pruned but the search

may still be very complex.

Indicator function not continuous and so not amenable to

standard optimization tools.

Challenge

: Find low complexity methods with acceptable

performance

(10)

Outline

1

Talk Objective

2

Sparse Signal Recovery Problem

3

Applications

4

Computational Algorithms

5

Performance Evaluation

(11)

Applications

Signal Representation (Mallat, Coifman, Wickerhauser, Donoho,

...)

EEG/MEG (Leahy, Gorodnitsky, Ioannides, ...)

Functional Approximation and Neural Networks (Chen,

Natarajan, Cun, Hassibi, ...)

Bandlimited extrapolations and spectral estimation (Papoulis,

Lee, Cabrera, Parks, ...)

Speech Coding (Ozawa, Ono, Kroon, Atal, ...)

Sparse channel equalization (Fevrier, Greenstein, Proakis, )

Compressive Sampling (Donoho, Candes, Tao..)

(12)

DFT Example

N

chosen to be 64 in this example.

Measurement t:

t

[

n

] = cos

ω

0

n

,

n

= 0

,

1

,

2

, ..,

N

1

ω

0

=

2

π

64

33

2

Dictionary Elements:

φ

k

= [1

,

e

j

ω

k

,

e

j

2

ω

k

, ..,

e

j

(

N

1)

ω

k

]

T

, ω

k

=

2

π

M

Consider

M

= 64,

128,

256 and 512

NOTE: The frequency components are included in the dictionary Φ

for

M

= 128,

256,

and 512.

(13)

FFT Results with Different

FFT Results

M

M=64

M=128

(14)

Magnetoencephalography (MEG)

Given measured magnetic fields outside of the head, the goal is to

locate the responsible current sources inside the head.

MEG Example

i

Given measured magnetic fields outside of the head, we would

like to locate the responsible current sources inside the head.

= MEG measurement point

= Candidate source location

(15)

MEG Example

At any given time, typically only a few sources are active (SPARSE).

MEG Example

i

At any given time, typically only a few sources are active.

= Candidate source location

= MEG measurement point

(16)

MEG Formulation

Forming the overcomplete dictionary Φ

The number of rows equals the number of sensors.

The number of columns equals the number of possible source

locations.

Φ

ij

= the magnetic field measured at sensor

i

produced by a

unit current at location

j

.

We can compute Φ using a boundary element brain model and

Maxwells equations.

Many different combinations of current sources can produce the

same observed magnetic field

t

.

By finding the sparsest signal representation/basis, we find the

smallest number of sources capable of producing the observed

field.

(17)

Compressive Sampling

Transform Coding

Compressive Sensing

(18)

Compressive Sampling

Transform Coding

Compressed

Sensing

Ɏ

w

b

Ɏ

w

b

A

A

t

Ɍ

Compressive Sensing

Computation :

t

w

b

(19)

Compressive Sampling

Transform Coding

Compressed

Sensing

Ɏ

w

b

Ɏ

w

b

A

A

t

Ɍ

Compressive Sensing

Compressed

Sensing

Ɏ

w

b

Ɏ

w

b

A

A

t

Ɍ

Computation :

t

w

b

(20)

Compressive Sampling

Transform Coding

Compressed

Sensing

Ɏ

w

b

Ɏ

w

b

A

A

t

Ɍ

Compressive Sensing

Compressed

Sensing

Ɏ

w

b

Ɏ

w

b

A

A

t

Ɍ

Computation :

t

w

b

(21)

Outline

1

Talk Objective

2

Sparse Signal Recovery Problem

3

Applications

4

Computational Algorithms

5

Performance Evaluation

(22)

Potential Approaches

Problem NP hard and so need alternate strategies

Greedy Search Techniques

: Matching Pursuit, Orthogonal

Matching Pursuit

Minimizing Diversity Measures

: Indicator function not

continuous. Define Surrogate Cost functions that are more

tractable and whose minimization leads to sparse solutions, e.g.

�1

minimization

Bayesian Methods

: Make appropriate Statistical assumptions on

the solution and apply estimation techniques to identify the

desired sparse solution

(23)

Greedy Search Methods: Matching Pursuit

Select a column that is most aligned with the current residual

0.74 ……. Ͳ0.3…….

t

ĭ

w

İ

Remove its contribution

Stop when residual becomes small enough or if we have

exceeded some sparsity threshold.

Some Variations

Matching Pursuit

[Mallat & Zhang]

(24)

Inverse techniques

For the systems of equations Φ

w

=

t

,

the solution set is

characterized by

{

w

s

:

w

s

= Φ

+

t

+

v

,

v

∈ N

(Φ)

}

,

where

N

(Φ)

denotes the null space of Φ and Φ

+

= Φ

T

(ΦΦ

T

)

1

.

Minimum Norm solution

: The minimum

�2

norm solution

w

mn

= Φ

+

t

is a popular solution

Noisy Case

: regularized

�2

norm solution often employed and is given

by

w

reg

= Φ

T

(ΦΦ

T

+

λ

I

)

1

t

(25)

Diversity Measures

Functionals whose minimization leads to sparse solutions

Many examples are found in the fields of economics, social

science and information theory

These functionals are concave which leads to difficult

optimization problems

(26)

Examples of Diversity Measures

(

p

1)

Diversity Measure

E

(p)

(

w

) =

M

l=1

|

w

l

|

p

,

p

1

Gaussian Entropy

H

G

(

w

) =

M

l

=1

ln

|

w

l

|

2

Shannon Entropy

H

S

(

w

) =

M

l=1

˜

w

l

ln ˜

w

l.

where ˜

w

l

=

w

2

l

w

2

(27)

Diversity Minimization

Noiseless Case

min

w

E

(

p

)

(

w

) =

M

l

=1

|

w

l

|

p

subject to Φ

w

=

t

Noisy Case

min

w

t

Φ

w

2

+

λ

M

l

=1

|

w

l

|

p

p

= 1 is a very popular choice because of the convex nature of the

optimization problem (Basis Pursuit and Lasso).

(28)

Bayesian Methods

Maximum Aposteriori Approach (MAP)

Assume a sparsity inducing prior on the latent variable

w

Develop an appropriate MAP estimation algorithm

Empirical Bayes

Assume a parameterized prior on the latent variable

w

(hyperparameters)

Marginalize over the latent variable

w

and estimate the

hyperparameters

Determine the posterior distribution of

w

and obtain a point as

(29)

Generalized Gaussian Distribution

Density function: Subgaussian:

p

>

2 and Supergaussian :

p

<

2

f

(

x

) =

p

2σΓ(

1

p

)

exp

|

x

|

σ

p

Generalized Gaussian Distribution

0.5 1 1.5 2 2.5 3 p=0.5 p=1 p=2 p=10

( )

exp

2

1

p

x

p

p x

p

P

V

V

­

§

·

½

°

°

®

¨

¸

¾

*

°

¯

©

¹

°

¿

-

p < 2: Super

Gaussian

-

p > 2: Sub

Gaussian

(30)

Student t Distribution

Density function:

f

(

x

) =

Γ(

ν

+1

2

)

πν

Γ(

ν

2

)

1 +

x

2

ν

(

ν+1 2

)

Student-t Distribution

1

2

2

1

2

( )

1

2

x

p x

Q

Q

Q

Q

QS

§

·

¨

©

¸

¹

§

·

©

¸ §

¹

·

¨

¸

§ · ©

¹

*¨ ¸

© ¹

-5 0 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Q = 1 Q = 2 Q = 5 Gaussian

(31)

MAP using a Supergaussian prior

Assuming a Gaussian likelihood model for

f

(

t

|

w

), we can find MAP

weight estimates

w

MAP

= arg max

w

log

f

(

w

|

t

)

= arg max

w

(log

f

(

t

|

w

) + log

f

(

w

))

=

arg min

w

Φ

w

t

2

+

λ

M

l

=1

|

w

l

|

p

This is essentially a regularized LS framework. Interesting range for

p

(32)

MAP Estimate: FOCal Underdetermined System

Solver (FOCUSS)

Approach involves solving a sequence of Regularized Weighted

Minimum Norm problems

q

k

+1

= arg min

q

Φ

k

+1

q

t

2

+

λ

q

2

where Φk

+1

= Φ

M

k+1,

and

M

k+1

= diag(

|

w

k

.

l

|

1

p

2

.

w

k

+1

=

M

k

+1

q

k

+1

.

(33)

FOCUSS summary

For

p

<

1,

the solution is initial condition dependent

Prior knowledge can be incorporated

Minimum norm is a suitable choice

Can retry with several initial conditions

Computationally more complex than Matching Pursuit

algorithms

Sparsity versus tolerance tradeoff more involved

Factor

p

allows a trade-off between the speed of convergence

and the sparsity obtained

(34)

Convergence Errors vs. Structural Errors

Convergence Errors vs. Structural Errors

w

0

w

*

-log

p(w|t)

w

0

w

*

-log

p(w|t)

Convergence Error

Structural Error

w

*

= solution we have converged to

w

0

= maximally sparse solution

(35)

Shortcomings of these Methods

p

= 1

Basis Pursuit/Lasso often suffer from structural errors.

Therefore, regardless of initialization, we may never find the best

solution.

p

<

1

The FOCUSS class of algorithms suffers from numerous

suboptimal local minima and therefore convergence errors.

In the low noise limit, the number of local minima

K

satisfies

K

��

M

1

N

+ 1,

M

N

��

At most local minima, the number of nonzero coefficients is

equal to

N

, the number of rows in the dictionary.

(36)

Empirical Bayesian Method

Main Steps

Parameterized prior

f

(

w

|

γ)

Marginalize

f

(

t

|

γ) =

f

(

t

,

w

|

γ)

dw

=

f

(

t

|

w

)

f

(

w

|

γ)

dw

Estimate the hyperparameter ˆ

γ

Determine the posterior density of the latent variable

f

(

w

|

t

,

γ)

ˆ

Obtain point estimate of

w

(37)

Outline

1

Talk Objective

2

Sparse Signal Recovery Problem

3

Applications

4

Computational Algorithms

5

Performance Evaluation

(38)
(39)

Empirical Tests

Random overcomplete bases Φ and sparse weight vectors

w

0

were generated and used to create target signals

t

, i.e.,

t

= Φ

w

0

+

SBL (Empirical Bayes) was compared with Basis Pursuit and

FOCUSS (with various

p

values) in the task of recovering

w

0.

(40)

Experiment 1: Comparison with Noiseless Data

Randomized Φ (20 rows by 40 columns).

Diversity of the true

w

0

is 7.

Results are from 1000 independent trials.

NOTE: An error occurs whenever an algorithm converges to a

solution

w

not equal to

w

0.

Experiment I: Noiseless Comparison

i

Randomized

)

(20 rows by 40 columns).

i

Diversity of the ‘‘true’’ w

0

is 7.

i

Results are from 1000 independent trials.

i

NOTE: An error occurs whenever an algorithm converges to a

(41)

Experiment II: Comparison with Noisy Data

Randomized Φ (20 rows by 40 columns).

Diversity of the true

w

0

is 7.

20 db AWGN

Results are from 1000 independent trials.

NOTE: We no longer distinguish convergence errors from structural

errors.

Experiment II: Noisy Comparisons

i

Randomized

)

(20 rows by 40 columns).

i

20 dB AWGN.

i

Diversity of the ‘‘true’’

w

is 7.

i

Results are from 1000 independent trials.

i

NOTE

: We no longer distinguish convergence errors from

structural errors.

(42)
(43)

MEG Example

Data based on CTF MEG system at UCSF with 275 scalp

sensors.

Forward model based on 40,

000 points (vertices of a triangular

grid) and 3 different scale factors.

Dimensions of lead field matrix (dictionary): 275 by 120,

000.

Overcompleteness ratio approximately 436.

Up to 40 unknown dipole components were randomly placed

throughout the sample space.

SBL was able to resolve roughly twice as many dipoles as the

next best method (Ramirez 2005).

(44)

Outline

1

Talk Objective

2

Sparse Signal Recovery Problem

3

Applications

4

Computational Algorithms

5

Performance Evaluation

(45)

Summary

Discussed role of sparseness in linear inverse problems

Discussed Applications of Sparsity

Discussed methods for computing sparse solutions

Matching Pursuit Algorithms

MAP methods (FOCUSS Algorithm and

1

minimization)

Empirical Bayes (Sparse Bayesian Learning (SBL))

Expectation is that there will be continued growth in the application

domain as well as in the algorithm development.

References

Related documents

The method requires the individual to indicate which pen contains the ordorant out of a triplet of pens, one of which is an n-butanol odor pen and the other is an odorless pen,

diagnostic categories for both, with studies published in BT using these in a majority (51%) of titles; 2) a significant increase of studies published in JABA focusing

A satisfying picture book captures all the drama and power of a summer storm even as it rewards readers with a smaller-scale, homey story!. Sky Magic selected by Lee Bennett

International Valuation Standards Council International Federation of Accountants Financial Accounting Foundation Financial Accounting Standards Board. American Institute of

Balasundaram Maniam and Amitiava chatterjee (1998) studied on the determinants of US foreign investment in India; tracing the growth of US FDI in India and

This means, says JP II that the basis for determining the value of human work is not primarily the kind of work being done, but the fact that the one doing it is a person..

Once again in the history of cobalt, we have to go back to Elwood Haynes. The alloys used in this field are again based on the Stellite® alloys developed in the early 1900’s,