Inverse Problems for Dummies

(1)

Inverse Problems for Dummies

Scott Ziegler (a very stable genius)

Colorado State University

September 13, 2018

(2)

What is an inverse problem?

There are many physical situations in which we acquire data from some phenomenon or object which we cannot see.

Geology, medicine, radar, astronomy, etc.

The task of using the mathematical model of the situation to simulate data is called a forward problem.

The task of using the gathered data to recreate the mathematical model of the situation is called an inverse problem.

In practice, we typically look for a specific parameter of the mathematical model.

(3)

What is an inverse problem?

(4)

What is an inverse problem?

(5)

What is an inverse problem?

(6)

What is an inverse problem?

(7)

Some simple examples

Deblurring

1

(8)

Some simple examples

X-ray imaging

2

(9)

Classifying inverse problems

Inverse problems can be broadly summarized by the following equation:

b = Ax +

where b represents the observation (or result of the forward problem), x represents the model parameters, A represents the operator governing the model, and is a random noise vector (typically normally distributed).

When the operator A = A is a linear operator, we call the problem linear. The problem could still be infinite dimensional, but we will typically discretize to get b ∈ Rⁿ, x ∈ R^m and A = R^m×n.

If the operator A is nonlinear, we cleverly call the inverse problem nonlinear. In this case we are typically dealing with A : H → H where H is a Hilbert space.

(10)

Classifying inverse problems

Inverse problems can be broadly summarized by the following equation:

b = Ax +

where b represents the observation (or result of the forward problem), x represents the model parameters, A represents the operator governing the model, and is a random noise vector (typically normally distributed).

When the operator A = A is a linear operator, we call the problem linear. The problem could still be infinite dimensional, but we will typically discretize to get b ∈ Rⁿ, x ∈ R^m and A = R^m×n.

(11)

Linear inverse problems

We’ll begin (and essentially end) by studying linear inverse problems since they are much easier to work with.

We would like for our inverse problem to be well-posed, which means it satisfies the following three conditions:

Existence: there should be a solution.

Uniqueness: there should be at most one solution

Stability: the solution must continuously depend on the data. These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.

If an inverse problem is not well-posed, it is ill-posed.

(12)

Linear inverse problems

(13)

Linear inverse problems

(14)

Linear inverse problems

(15)

Linear inverse problems

Stability: the solution must continuously depend on the data.

These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.

(16)

Linear inverse problems

(17)

Linear inverse problems

(18)

Linear inverse problems

If our inverse problem is well-posed, then our work is essentially done. We simply need to solve a least squares problem.

xLS∝ arg min_x||Ax − b||.

In practice every inverse problem is ill-posed, and it turns out that solving a least squares problem for ill-posed inverse problems goes very badly.

(19)

Linear inverse problems

If our inverse problem is well-posed, then our work is essentially done. We simply need to solve a least squares problem.

xLS∝ arg min_x||Ax − b||.

In practice every inverse problem is ill-posed, and it turns out that solving a least squares problem for ill-posed inverse problems goes very badly.

(20)

Ill-posedness in convolution

As an example of this, consider the inverse problem of deconvolution.

Given the (possibly noisy) convolution of a function f , reconstruct the original function.

We can describe this through the continuous model b = (ψ ∗ f ) (x) =

Z a

−a

ψ(x⁰)f (x − x⁰)dx⁰.

We can then discretize this problem using some type of quadrature rule and end up with an equation of the form b = Af and attempt to solve the inverse problem using least squares.

(21)

Ill-posedness in convolution

Z a

−a

ψ(x⁰)f (x − x⁰)dx⁰.

(22)

Ill-posedness in convolution

Z a

−a

ψ(x⁰)f (x − x⁰)dx⁰.

(23)

Ill-posedness in convolution

Z a

−a

ψ(x⁰)f (x − x⁰)dx⁰.

(24)

Ill-posedness in convolution

Left: The piecewise continuous function f (x). Right: The function (ψ ∗ f )(x).

(25)

Ill-posedness in convolution

Left: Result of a least squares inversion with no additive noise. Right:

Result of a least squares inversion with data corrupted by 1% white noise.

(26)

SVD and ill-posedness

We’ll show what went wrong here by analyzing the singular value decomposition of a matrix describing an arbitrary ill-posed linear inverse problem and take a look at the statistical properties of xLS.

Given a matrix A of rank r, the singular value decomposition (SVD) of A is

A = U ΣV^T where U, V are n × n orthogonal and

σ = diag(σ₁, σ₂, ..., σ_r, 0, ..., 0) ∈ R^m×n with σ₁ ≥ σ₂ ≥ ... ≥ σ_r ≥ 0 the singular values of A.

The columns of U = [u₁, ..., u_m] and V = [v₁, ..., u_m] are the left and right singular vectors of A, respectively.

The outer product form of the SVD is A =

r

X

i=1

u_iσ_iv_i^T.

(27)

SVD and ill-posedness

We’ll show what went wrong here by analyzing the singular value decomposition of a matrix describing an arbitrary ill-posed linear inverse problem and take a look at the statistical properties of xLS. Given a matrix A of rank r, the singular value decomposition (SVD) of A is

σ = diag(σ₁, σ₂, ..., σ_r, 0, ..., 0) ∈ R^m×n with σ₁ ≥ σ₂ ≥ ... ≥ σ_r≥ 0 the singular values of A.

The outer product form of the SVD is A =

r

X

i=1

u_iσ_iv_i^T.

(28)

SVD and ill-posedness

We’ll show what went wrong here by analyzing the singular value decomposition of a matrix describing an arbitrary ill-posed linear inverse problem and take a look at the statistical properties of xLS. Given a matrix A of rank r, the singular value decomposition (SVD) of A is

σ = diag(σ₁, σ₂, ..., σ_r, 0, ..., 0) ∈ R^m×n with σ₁ ≥ σ₂ ≥ ... ≥ σ_r≥ 0 the singular values of A.

The outer product form of the SVD is

(29)

SVD and ill-posedness

We can then define the pseudo-inverse of A by A^†= V Σ^†U^T

where Σ^†= diag(σ₁⁻¹, σ₂⁻¹, ..., σ_r⁻¹, 0, ..., 0) ∈ R^n×m

The outer product form of this is

A^†=

r

X

i=1

viσ⁻¹_i ui

(30)

SVD and ill-posedness

We can then define the pseudo-inverse of A by A^†= V Σ^†U^T

where Σ^†= diag(σ₁⁻¹, σ₂⁻¹, ..., σ_r⁻¹, 0, ..., 0) ∈ R^n×m The outer product form of this is

A^†=

r

X

i=1

viσ⁻¹_i ui

(31)

SVD and ill-posedness

It turns out the least squares solution x_LS can be written as xLS = A^†b

=

r

X

i=1

u^T_i b σ_i

v_i

=

r

X

i=1

(v_i^Tx)vi+

r

X

i=1

u^T_i σi

vi.

The first sum here is the projection of the true solution x onto the span of {v_i}^r_i=1, while the second sum represents the corruption of x_LS that occurs due to the presence of .

If the matrix describing our model has certain properties (i.e. rapidly decaying singular values and highly oscillatory right singular vectors as i increases, then our least squares solution is corrupted.

(32)

SVD and ill-posedness

=

r

X

i=1

u^T_i b σ_i

v_i

=

r

X

i=1

(v_i^Tx)vi+

r

X

i=1

u^T_i σi

vi.

If the matrix describing our model has certain properties (i.e. rapidly decaying singular values and highly oscillatory right singular vectors as i increases, then our least squares solution is corrupted.

(33)

SVD and ill-posedness

=

r

X

i=1

u^T_i b σ_i

v_i

=

r

X

i=1

(v_i^Tx)vi+

r

X

i=1

u^T_i σi

vi.

If the matrix describing our model has certain properties (i.e.

(34)

SVD and ill-posedness

(35)

SVD and ill-posedness

xLS is a random vector in the span of {vi}^r_i=1, specifically we have

v^T_jxLS = v_j^Tx +u^T_j σj

∼ N (v_j^Tx, σ²/σ_j²)

where σ² is the variance of the random vector .

This shows the variance of x_LS in the direction v_j is σ²/σ_j², which will be large for large values of j.

(36)

SVD and ill-posedness

xLS is a random vector in the span of {vi}^r_i=1, specifically we have

v^T_jxLS = v_j^Tx +u^T_j σj

∼ N (v_j^Tx, σ²/σ_j²)

where σ² is the variance of the random vector .

This shows the variance of x_LS in the direction v_j is σ²/σ_j², which will be large for large values of j.

(37)

Example

Let A be defined via A = v₁v₁^T + 10⁻²v₂v^T₂ with v1 = [1/√

2, 1/√

2]^T, v2 = [−1/√ 2, 1/√

2]^T.

If x = [1, 1]^T, then clearly b = Ax = [1, 1]^T and A⁻¹b = [1, 1]^T. However if we add one realization of given by = [0.026, 0.075]^T, then A⁻¹b = [−1.400, 3.501]^T.

(38)

Example

2, 1/√

2]^T, v2 = [−1/√ 2, 1/√

2]^T.

If x = [1, 1]^T, then clearly b = Ax = [1, 1]^T and A⁻¹b = [1, 1]^T.

However if we add one realization of given by = [0.026, 0.075]^T, then A⁻¹b = [−1.400, 3.501]^T.

(39)

Example

2, 1/√

2]^T, v2 = [−1/√ 2, 1/√

2]^T.

If x = [1, 1]^T, then clearly b = Ax = [1, 1]^T and A⁻¹b = [1, 1]^T. However if we add one realization of given by = [0.026, 0.075]^T, then A⁻¹b = [−1.400, 3.501]^T.

(40)

Regularization

Now that we know that problem with least squares, we want to alter the method in order to minimize the damage caused by the highly oscillatory right singular vectors.

This is known as regularization, or spectral filtering. The easiest solution to this problem is to simply remove the bothersome singular vectors. This is known as truncated singular value decomposition (TSVD).

(41)

Regularization

This is known as regularization, or spectral filtering.

The easiest solution to this problem is to simply remove the bothersome singular vectors. This is known as truncated singular value decomposition (TSVD).

(42)

Regularization

This is known as regularization, or spectral filtering.

The easiest solution to this problem is to simply remove the bothersome singular vectors. This is known as truncated singular value decomposition (TSVD).

(43)

Example

Returning to our example above, our poor reconstruction was due to high variance in the direction of v₂ (which has singular value 10⁻²).

If we simply remove this singular vector to get A_filt= σ₁v₁v₁^T then we see that xLSfiltered = A^†_filtb = σ₁⁻¹(v₁^Tb)v1 = [1.0505, 1.0505]^T.

(44)

Example

(45)

Example

(46)

Regularization

We can generalize this idea by saying that our regularized least squares solution is

xν = V ΦνΣ^†U^Tb

with Φ_ν to be chosen depending on the choice of regularization.

As an example, for TSVD we have

Φν = diag(φ^(ν)₁ , ..., φ^(ν)_r , 0, ..., 0) ∈ R^n×n with

φ^ν_i =

(1, i = 1, ..., k 0, i = k + 1, ..., r

(47)

Regularization

We can generalize this idea by saying that our regularized least squares solution is

xν = V ΦνΣ^†U^Tb

with Φ_ν to be chosen depending on the choice of regularization.

As an example, for TSVD we have

Φν = diag(φ^(ν)₁ , ..., φ^(ν)_r , 0, ..., 0) ∈ R^n×n with

ν

(1, i = 1, ..., k

(48)

Regularization

Other common types of regularization are:

Tikhonov regularization. Here we reframe the least squares problem as

x_ν= arg min_x 1

2||Ax − b||²+ν 2||x||²

which ends up giving Φν defined by φ^ν_i =_σ^σ2²ⁱ

i+ν, i = 1, ...r. Total variation regularization. Here we reframe the least squares problem as

xν = arg min_x 1

2||Ax − b||²+ν 2||Lx||

where L is a finite difference matrix

(49)

Regularization

x_ν= arg min_x 1

2||Ax − b||²+ν 2||x||²

which ends up giving Φν defined by φ^ν_i =_σ2^σ²ⁱ

i+ν, i = 1, ...r.

Total variation regularization. Here we reframe the least squares problem as

xν = arg min_x 1

2||Ax − b||²+ν 2||Lx||

where L is a finite difference matrix

(50)

Regularization

x_ν= arg min_x 1

2||Ax − b||²+ν 2||x||²

which ends up giving Φν defined by φ^ν_i =_σ2^σ²ⁱ

i+ν, i = 1, ...r.

Total variation regularization. Here we reframe the least squares problem as

xν = arg min_x 1

2||Ax − b||²+ν 2||Lx||

(51)

Regularization

Regularization lets us ”fix” an ill-behaved matrix by truncating or altering the right singular values which are troublesome.

The only real issue is choosing the regularization parameter ν. We can reframe this idea of ”fixing” a matrix with some

knowledge of the solution in terms of the stochastic approach to inverse problems.

(52)

Regularization

The only real issue is choosing the regularization parameter ν.

We can reframe this idea of ”fixing” a matrix with some

(53)

Regularization

The only real issue is choosing the regularization parameter ν.

We can reframe this idea of ”fixing” a matrix with some

(54)

Statistical properties of regularized solutions

We saw that the least squares solution x_LS has a high variance for later singular values.

The outer product form of our regularized solution is given by

x_ν =

r

X

i=1

φ^(ν)_i (v^T_i x)v_i+

r

X

i=1

φ^(ν)_i u^T_i σi

v_I

which produces v_i^Tx_ν = φ^(ν)_i

v^T_i x + u^T_i σ_i

∼ N

φ^(ν)_i v_i^Tx, φ^(ν)_i 2

σ²/σ_i²

Thus we decrease the variance of the solution in the direction of vi

for large i, but we have introduced bias.

(55)

Statistical properties of regularized solutions

x_ν =

r

X

i=1

r

X

i=1

φ^(ν)_i u^T_i σi

v_I

v^T_i x + u^T_i σi

∼ N

σ²/σ_i²

Thus we decrease the variance of the solution in the direction of vi

for large i, but we have introduced bias.

(56)

Statistical properties of regularized solutions

x_ν =

r

X

i=1

r

X

i=1

φ^(ν)_i u^T_i σi

v_I

v^T_i x + u^T_i σi

∼ N

σ²/σ_i²

(57)

The stochastic approach to inverse problems

Through regularization we are really ”encoding” some information we have about the solution x by minimizing either x or ∇x in whatever norm we choose.

We can reframe this idea by immediately assuming x is a random vector with its own prior distribution.

x is described by the prior probability density function (or just prior) p(x|δ) where δ is some positive scaling parameter.

Once a prior is chosen, we obtain the posterior density function through Bayes’ law.

p(x|b, λ, δ) ∝ p(b|x, λ)p(x|δ)

(58)

The stochastic approach to inverse problems

(59)

The stochastic approach to inverse problems

(60)

The stochastic approach to inverse problems

Choosing the prior is the most important step here, and we

typically let x be a Gaussian Markov random field whose precision matrix is specified by our knowledge of the solution.

If the probability density function of our posterior density function is not well known, we must sample from the density using methods such as Markov Chain Monte Carlo.

Using the stochastic approach, we obtain more information

(uncertainties of the random variable), but could possibly increase the expense of our algorithms.

(61)

The stochastic approach to inverse problems

(62)

The stochastic approach to inverse problems

(63)

Nonlinear inverse problems

Linear inverse problems are all pretty much the same, this is absolutely not true for nonlinear inverse problems.

We need very specially tailored techniques and algorithms in order to solve nonlinear inverse problems (unless we do least squares, which is very costly).

What you’ll need to know to study nonlinear inverse problems: PDE’s

Analysis (specifically functional analysis) Numerical analysis

Probability and statistics

Knowledge of the physics underlying the situation

(64)

Nonlinear inverse problems

What you’ll need to know to study nonlinear inverse problems: PDE’s

(65)

Nonlinear inverse problems

What you’ll need to know to study nonlinear inverse problems:

PDE’s

(66)

Thanks for listening!

(67)

References

Jennifer Mueller and Samuli Siltanen, Linear and Nonlinear Inverse Problems with Practical Applications. SIAM. 2012.

Johnathan Bardsley, Computational Uncertainty Quantification for Inverse Problems. SIAM. 2018.