Inverse Problems for Dummies
Scott Ziegler (a very stable genius)
Colorado State University
September 13, 2018
What is an inverse problem?
There are many physical situations in which we acquire data from some phenomenon or object which we cannot see.
Geology, medicine, radar, astronomy, etc.
The task of using the mathematical model of the situation to simulate data is called a forward problem.
The task of using the gathered data to recreate the mathematical model of the situation is called an inverse problem.
In practice, we typically look for a specific parameter of the mathematical model.
What is an inverse problem?
There are many physical situations in which we acquire data from some phenomenon or object which we cannot see.
Geology, medicine, radar, astronomy, etc.
The task of using the mathematical model of the situation to simulate data is called a forward problem.
The task of using the gathered data to recreate the mathematical model of the situation is called an inverse problem.
In practice, we typically look for a specific parameter of the mathematical model.
What is an inverse problem?
There are many physical situations in which we acquire data from some phenomenon or object which we cannot see.
Geology, medicine, radar, astronomy, etc.
The task of using the mathematical model of the situation to simulate data is called a forward problem.
The task of using the gathered data to recreate the mathematical model of the situation is called an inverse problem.
In practice, we typically look for a specific parameter of the mathematical model.
What is an inverse problem?
There are many physical situations in which we acquire data from some phenomenon or object which we cannot see.
Geology, medicine, radar, astronomy, etc.
The task of using the mathematical model of the situation to simulate data is called a forward problem.
The task of using the gathered data to recreate the mathematical model of the situation is called an inverse problem.
In practice, we typically look for a specific parameter of the mathematical model.
What is an inverse problem?
There are many physical situations in which we acquire data from some phenomenon or object which we cannot see.
Geology, medicine, radar, astronomy, etc.
The task of using the mathematical model of the situation to simulate data is called a forward problem.
The task of using the gathered data to recreate the mathematical model of the situation is called an inverse problem.
In practice, we typically look for a specific parameter of the mathematical model.
Some simple examples
Deblurring
1
Some simple examples
X-ray imaging
2
Classifying inverse problems
Inverse problems can be broadly summarized by the following equation:
b = Ax +
where b represents the observation (or result of the forward problem), x represents the model parameters, A represents the operator governing the model, and is a random noise vector (typically normally distributed).
When the operator A = A is a linear operator, we call the problem linear. The problem could still be infinite dimensional, but we will typically discretize to get b ∈ Rn, x ∈ Rm and A = Rm×n.
If the operator A is nonlinear, we cleverly call the inverse problem nonlinear. In this case we are typically dealing with A : H → H where H is a Hilbert space.
Classifying inverse problems
Inverse problems can be broadly summarized by the following equation:
b = Ax +
where b represents the observation (or result of the forward problem), x represents the model parameters, A represents the operator governing the model, and is a random noise vector (typically normally distributed).
When the operator A = A is a linear operator, we call the problem linear. The problem could still be infinite dimensional, but we will typically discretize to get b ∈ Rn, x ∈ Rm and A = Rm×n.
Linear inverse problems
We’ll begin (and essentially end) by studying linear inverse problems since they are much easier to work with.
We would like for our inverse problem to be well-posed, which means it satisfies the following three conditions:
Existence: there should be a solution.
Uniqueness: there should be at most one solution
Stability: the solution must continuously depend on the data. These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.
If an inverse problem is not well-posed, it is ill-posed.
Linear inverse problems
We’ll begin (and essentially end) by studying linear inverse problems since they are much easier to work with.
We would like for our inverse problem to be well-posed, which means it satisfies the following three conditions:
Existence: there should be a solution.
Uniqueness: there should be at most one solution
Stability: the solution must continuously depend on the data. These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.
If an inverse problem is not well-posed, it is ill-posed.
Linear inverse problems
We’ll begin (and essentially end) by studying linear inverse problems since they are much easier to work with.
We would like for our inverse problem to be well-posed, which means it satisfies the following three conditions:
Existence: there should be a solution.
Uniqueness: there should be at most one solution
Stability: the solution must continuously depend on the data. These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.
If an inverse problem is not well-posed, it is ill-posed.
Linear inverse problems
We’ll begin (and essentially end) by studying linear inverse problems since they are much easier to work with.
We would like for our inverse problem to be well-posed, which means it satisfies the following three conditions:
Existence: there should be a solution.
Uniqueness: there should be at most one solution
Stability: the solution must continuously depend on the data. These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.
If an inverse problem is not well-posed, it is ill-posed.
Linear inverse problems
We’ll begin (and essentially end) by studying linear inverse problems since they are much easier to work with.
We would like for our inverse problem to be well-posed, which means it satisfies the following three conditions:
Existence: there should be a solution.
Uniqueness: there should be at most one solution
Stability: the solution must continuously depend on the data.
These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.
If an inverse problem is not well-posed, it is ill-posed.
Linear inverse problems
We’ll begin (and essentially end) by studying linear inverse problems since they are much easier to work with.
We would like for our inverse problem to be well-posed, which means it satisfies the following three conditions:
Existence: there should be a solution.
Uniqueness: there should be at most one solution
Stability: the solution must continuously depend on the data.
These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.
If an inverse problem is not well-posed, it is ill-posed.
Linear inverse problems
We’ll begin (and essentially end) by studying linear inverse problems since they are much easier to work with.
We would like for our inverse problem to be well-posed, which means it satisfies the following three conditions:
Existence: there should be a solution.
Uniqueness: there should be at most one solution
Stability: the solution must continuously depend on the data.
These conditions are equivalent to saying that our map A (which in the finite dimensional case is just a matrix) should have a continuous inverse.
If an inverse problem is not well-posed, it is ill-posed.
Linear inverse problems
If our inverse problem is well-posed, then our work is essentially done. We simply need to solve a least squares problem.
xLS∝ arg minx||Ax − b||.
In practice every inverse problem is ill-posed, and it turns out that solving a least squares problem for ill-posed inverse problems goes very badly.
Linear inverse problems
If our inverse problem is well-posed, then our work is essentially done. We simply need to solve a least squares problem.
xLS∝ arg minx||Ax − b||.
In practice every inverse problem is ill-posed, and it turns out that solving a least squares problem for ill-posed inverse problems goes very badly.
Ill-posedness in convolution
As an example of this, consider the inverse problem of deconvolution.
Given the (possibly noisy) convolution of a function f , reconstruct the original function.
We can describe this through the continuous model b = (ψ ∗ f ) (x) =
Z a
−a
ψ(x0)f (x − x0)dx0.
We can then discretize this problem using some type of quadrature rule and end up with an equation of the form b = Af and attempt to solve the inverse problem using least squares.
Ill-posedness in convolution
As an example of this, consider the inverse problem of deconvolution.
Given the (possibly noisy) convolution of a function f , reconstruct the original function.
We can describe this through the continuous model b = (ψ ∗ f ) (x) =
Z a
−a
ψ(x0)f (x − x0)dx0.
We can then discretize this problem using some type of quadrature rule and end up with an equation of the form b = Af and attempt to solve the inverse problem using least squares.
Ill-posedness in convolution
As an example of this, consider the inverse problem of deconvolution.
Given the (possibly noisy) convolution of a function f , reconstruct the original function.
We can describe this through the continuous model b = (ψ ∗ f ) (x) =
Z a
−a
ψ(x0)f (x − x0)dx0.
We can then discretize this problem using some type of quadrature rule and end up with an equation of the form b = Af and attempt to solve the inverse problem using least squares.
Ill-posedness in convolution
As an example of this, consider the inverse problem of deconvolution.
Given the (possibly noisy) convolution of a function f , reconstruct the original function.
We can describe this through the continuous model b = (ψ ∗ f ) (x) =
Z a
−a
ψ(x0)f (x − x0)dx0.
We can then discretize this problem using some type of quadrature rule and end up with an equation of the form b = Af and attempt to solve the inverse problem using least squares.
Ill-posedness in convolution
Left: The piecewise continuous function f (x). Right: The function (ψ ∗ f )(x).
Ill-posedness in convolution
Left: Result of a least squares inversion with no additive noise. Right:
Result of a least squares inversion with data corrupted by 1% white noise.
SVD and ill-posedness
We’ll show what went wrong here by analyzing the singular value decomposition of a matrix describing an arbitrary ill-posed linear inverse problem and take a look at the statistical properties of xLS.
Given a matrix A of rank r, the singular value decomposition (SVD) of A is
A = U ΣVT where U, V are n × n orthogonal and
σ = diag(σ1, σ2, ..., σr, 0, ..., 0) ∈ Rm×n with σ1 ≥ σ2 ≥ ... ≥ σr ≥ 0 the singular values of A.
The columns of U = [u1, ..., um] and V = [v1, ..., um] are the left and right singular vectors of A, respectively.
The outer product form of the SVD is A =
r
X
i=1
uiσiviT.
SVD and ill-posedness
We’ll show what went wrong here by analyzing the singular value decomposition of a matrix describing an arbitrary ill-posed linear inverse problem and take a look at the statistical properties of xLS. Given a matrix A of rank r, the singular value decomposition (SVD) of A is
A = U ΣVT where U, V are n × n orthogonal and
σ = diag(σ1, σ2, ..., σr, 0, ..., 0) ∈ Rm×n with σ1 ≥ σ2 ≥ ... ≥ σr≥ 0 the singular values of A.
The columns of U = [u1, ..., um] and V = [v1, ..., um] are the left and right singular vectors of A, respectively.
The outer product form of the SVD is A =
r
X
i=1
uiσiviT.
SVD and ill-posedness
We’ll show what went wrong here by analyzing the singular value decomposition of a matrix describing an arbitrary ill-posed linear inverse problem and take a look at the statistical properties of xLS. Given a matrix A of rank r, the singular value decomposition (SVD) of A is
A = U ΣVT where U, V are n × n orthogonal and
σ = diag(σ1, σ2, ..., σr, 0, ..., 0) ∈ Rm×n with σ1 ≥ σ2 ≥ ... ≥ σr≥ 0 the singular values of A.
The columns of U = [u1, ..., um] and V = [v1, ..., um] are the left and right singular vectors of A, respectively.
The outer product form of the SVD is
SVD and ill-posedness
We can then define the pseudo-inverse of A by A†= V Σ†UT
where Σ†= diag(σ1−1, σ2−1, ..., σr−1, 0, ..., 0) ∈ Rn×m
The outer product form of this is
A†=
r
X
i=1
viσ−1i ui
SVD and ill-posedness
We can then define the pseudo-inverse of A by A†= V Σ†UT
where Σ†= diag(σ1−1, σ2−1, ..., σr−1, 0, ..., 0) ∈ Rn×m The outer product form of this is
A†=
r
X
i=1
viσ−1i ui
SVD and ill-posedness
It turns out the least squares solution xLS can be written as xLS = A†b
=
r
X
i=1
uTi b σi
vi
=
r
X
i=1
(viTx)vi+
r
X
i=1
uTi σi
vi.
The first sum here is the projection of the true solution x onto the span of {vi}ri=1, while the second sum represents the corruption of xLS that occurs due to the presence of .
If the matrix describing our model has certain properties (i.e. rapidly decaying singular values and highly oscillatory right singular vectors as i increases, then our least squares solution is corrupted.
SVD and ill-posedness
It turns out the least squares solution xLS can be written as xLS = A†b
=
r
X
i=1
uTi b σi
vi
=
r
X
i=1
(viTx)vi+
r
X
i=1
uTi σi
vi.
The first sum here is the projection of the true solution x onto the span of {vi}ri=1, while the second sum represents the corruption of xLS that occurs due to the presence of .
If the matrix describing our model has certain properties (i.e. rapidly decaying singular values and highly oscillatory right singular vectors as i increases, then our least squares solution is corrupted.
SVD and ill-posedness
It turns out the least squares solution xLS can be written as xLS = A†b
=
r
X
i=1
uTi b σi
vi
=
r
X
i=1
(viTx)vi+
r
X
i=1
uTi σi
vi.
The first sum here is the projection of the true solution x onto the span of {vi}ri=1, while the second sum represents the corruption of xLS that occurs due to the presence of .
If the matrix describing our model has certain properties (i.e.
SVD and ill-posedness
SVD and ill-posedness
xLS is a random vector in the span of {vi}ri=1, specifically we have
vTjxLS = vjTx +uTj σj
∼ N (vjTx, σ2/σj2)
where σ2 is the variance of the random vector .
This shows the variance of xLS in the direction vj is σ2/σj2, which will be large for large values of j.
SVD and ill-posedness
xLS is a random vector in the span of {vi}ri=1, specifically we have
vTjxLS = vjTx +uTj σj
∼ N (vjTx, σ2/σj2)
where σ2 is the variance of the random vector .
This shows the variance of xLS in the direction vj is σ2/σj2, which will be large for large values of j.
Example
Let A be defined via A = v1v1T + 10−2v2vT2 with v1 = [1/√
2, 1/√
2]T, v2 = [−1/√ 2, 1/√
2]T.
If x = [1, 1]T, then clearly b = Ax = [1, 1]T and A−1b = [1, 1]T. However if we add one realization of given by = [0.026, 0.075]T, then A−1b = [−1.400, 3.501]T.
Example
Let A be defined via A = v1v1T + 10−2v2vT2 with v1 = [1/√
2, 1/√
2]T, v2 = [−1/√ 2, 1/√
2]T.
If x = [1, 1]T, then clearly b = Ax = [1, 1]T and A−1b = [1, 1]T.
However if we add one realization of given by = [0.026, 0.075]T, then A−1b = [−1.400, 3.501]T.
Example
Let A be defined via A = v1v1T + 10−2v2vT2 with v1 = [1/√
2, 1/√
2]T, v2 = [−1/√ 2, 1/√
2]T.
If x = [1, 1]T, then clearly b = Ax = [1, 1]T and A−1b = [1, 1]T. However if we add one realization of given by = [0.026, 0.075]T, then A−1b = [−1.400, 3.501]T.
Regularization
Now that we know that problem with least squares, we want to alter the method in order to minimize the damage caused by the highly oscillatory right singular vectors.
This is known as regularization, or spectral filtering. The easiest solution to this problem is to simply remove the bothersome singular vectors. This is known as truncated singular value decomposition (TSVD).
Regularization
Now that we know that problem with least squares, we want to alter the method in order to minimize the damage caused by the highly oscillatory right singular vectors.
This is known as regularization, or spectral filtering.
The easiest solution to this problem is to simply remove the bothersome singular vectors. This is known as truncated singular value decomposition (TSVD).
Regularization
Now that we know that problem with least squares, we want to alter the method in order to minimize the damage caused by the highly oscillatory right singular vectors.
This is known as regularization, or spectral filtering.
The easiest solution to this problem is to simply remove the bothersome singular vectors. This is known as truncated singular value decomposition (TSVD).
Example
Returning to our example above, our poor reconstruction was due to high variance in the direction of v2 (which has singular value 10−2).
If we simply remove this singular vector to get Afilt= σ1v1v1T then we see that xLSfiltered = A†filtb = σ1−1(v1Tb)v1 = [1.0505, 1.0505]T.
Example
Returning to our example above, our poor reconstruction was due to high variance in the direction of v2 (which has singular value 10−2).
If we simply remove this singular vector to get Afilt= σ1v1v1T then we see that xLSfiltered = A†filtb = σ1−1(v1Tb)v1 = [1.0505, 1.0505]T.
Example
Returning to our example above, our poor reconstruction was due to high variance in the direction of v2 (which has singular value 10−2).
If we simply remove this singular vector to get Afilt= σ1v1v1T then we see that xLSfiltered = A†filtb = σ1−1(v1Tb)v1 = [1.0505, 1.0505]T.
Regularization
We can generalize this idea by saying that our regularized least squares solution is
xν = V ΦνΣ†UTb
with Φν to be chosen depending on the choice of regularization.
As an example, for TSVD we have
Φν = diag(φ(ν)1 , ..., φ(ν)r , 0, ..., 0) ∈ Rn×n with
φνi =
(1, i = 1, ..., k 0, i = k + 1, ..., r
Regularization
We can generalize this idea by saying that our regularized least squares solution is
xν = V ΦνΣ†UTb
with Φν to be chosen depending on the choice of regularization.
As an example, for TSVD we have
Φν = diag(φ(ν)1 , ..., φ(ν)r , 0, ..., 0) ∈ Rn×n with
ν
(1, i = 1, ..., k
Regularization
Other common types of regularization are:
Tikhonov regularization. Here we reframe the least squares problem as
xν= arg minx 1
2||Ax − b||2+ν 2||x||2
which ends up giving Φν defined by φνi =σσ22i
i+ν, i = 1, ...r. Total variation regularization. Here we reframe the least squares problem as
xν = arg minx 1
2||Ax − b||2+ν 2||Lx||
where L is a finite difference matrix
Regularization
Other common types of regularization are:
Tikhonov regularization. Here we reframe the least squares problem as
xν= arg minx 1
2||Ax − b||2+ν 2||x||2
which ends up giving Φν defined by φνi =σ2σ2i
i+ν, i = 1, ...r.
Total variation regularization. Here we reframe the least squares problem as
xν = arg minx 1
2||Ax − b||2+ν 2||Lx||
where L is a finite difference matrix
Regularization
Other common types of regularization are:
Tikhonov regularization. Here we reframe the least squares problem as
xν= arg minx 1
2||Ax − b||2+ν 2||x||2
which ends up giving Φν defined by φνi =σ2σ2i
i+ν, i = 1, ...r.
Total variation regularization. Here we reframe the least squares problem as
xν = arg minx 1
2||Ax − b||2+ν 2||Lx||
Regularization
Regularization lets us ”fix” an ill-behaved matrix by truncating or altering the right singular values which are troublesome.
The only real issue is choosing the regularization parameter ν. We can reframe this idea of ”fixing” a matrix with some
knowledge of the solution in terms of the stochastic approach to inverse problems.
Regularization
Regularization lets us ”fix” an ill-behaved matrix by truncating or altering the right singular values which are troublesome.
The only real issue is choosing the regularization parameter ν.
We can reframe this idea of ”fixing” a matrix with some
knowledge of the solution in terms of the stochastic approach to inverse problems.
Regularization
Regularization lets us ”fix” an ill-behaved matrix by truncating or altering the right singular values which are troublesome.
The only real issue is choosing the regularization parameter ν.
We can reframe this idea of ”fixing” a matrix with some
knowledge of the solution in terms of the stochastic approach to inverse problems.
Statistical properties of regularized solutions
We saw that the least squares solution xLS has a high variance for later singular values.
The outer product form of our regularized solution is given by
xν =
r
X
i=1
φ(ν)i (vTi x)vi+
r
X
i=1
φ(ν)i uTi σi
vI
which produces viTxν = φ(ν)i
vTi x + uTi σi
∼ N
φ(ν)i viTx, φ(ν)i 2
σ2/σi2
Thus we decrease the variance of the solution in the direction of vi
for large i, but we have introduced bias.
Statistical properties of regularized solutions
We saw that the least squares solution xLS has a high variance for later singular values.
The outer product form of our regularized solution is given by
xν =
r
X
i=1
φ(ν)i (vTi x)vi+
r
X
i=1
φ(ν)i uTi σi
vI
which produces viTxν = φ(ν)i
vTi x + uTi σi
∼ N
φ(ν)i viTx, φ(ν)i 2
σ2/σi2
Thus we decrease the variance of the solution in the direction of vi
for large i, but we have introduced bias.
Statistical properties of regularized solutions
We saw that the least squares solution xLS has a high variance for later singular values.
The outer product form of our regularized solution is given by
xν =
r
X
i=1
φ(ν)i (vTi x)vi+
r
X
i=1
φ(ν)i uTi σi
vI
which produces viTxν = φ(ν)i
vTi x + uTi σi
∼ N
φ(ν)i viTx, φ(ν)i 2
σ2/σi2
The stochastic approach to inverse problems
Through regularization we are really ”encoding” some information we have about the solution x by minimizing either x or ∇x in whatever norm we choose.
We can reframe this idea by immediately assuming x is a random vector with its own prior distribution.
x is described by the prior probability density function (or just prior) p(x|δ) where δ is some positive scaling parameter.
Once a prior is chosen, we obtain the posterior density function through Bayes’ law.
p(x|b, λ, δ) ∝ p(b|x, λ)p(x|δ)
The stochastic approach to inverse problems
Through regularization we are really ”encoding” some information we have about the solution x by minimizing either x or ∇x in whatever norm we choose.
We can reframe this idea by immediately assuming x is a random vector with its own prior distribution.
x is described by the prior probability density function (or just prior) p(x|δ) where δ is some positive scaling parameter.
Once a prior is chosen, we obtain the posterior density function through Bayes’ law.
p(x|b, λ, δ) ∝ p(b|x, λ)p(x|δ)
The stochastic approach to inverse problems
Through regularization we are really ”encoding” some information we have about the solution x by minimizing either x or ∇x in whatever norm we choose.
We can reframe this idea by immediately assuming x is a random vector with its own prior distribution.
x is described by the prior probability density function (or just prior) p(x|δ) where δ is some positive scaling parameter.
Once a prior is chosen, we obtain the posterior density function through Bayes’ law.
p(x|b, λ, δ) ∝ p(b|x, λ)p(x|δ)
The stochastic approach to inverse problems
Choosing the prior is the most important step here, and we
typically let x be a Gaussian Markov random field whose precision matrix is specified by our knowledge of the solution.
If the probability density function of our posterior density function is not well known, we must sample from the density using methods such as Markov Chain Monte Carlo.
Using the stochastic approach, we obtain more information
(uncertainties of the random variable), but could possibly increase the expense of our algorithms.
The stochastic approach to inverse problems
Choosing the prior is the most important step here, and we
typically let x be a Gaussian Markov random field whose precision matrix is specified by our knowledge of the solution.
If the probability density function of our posterior density function is not well known, we must sample from the density using methods such as Markov Chain Monte Carlo.
Using the stochastic approach, we obtain more information
(uncertainties of the random variable), but could possibly increase the expense of our algorithms.
The stochastic approach to inverse problems
Choosing the prior is the most important step here, and we
typically let x be a Gaussian Markov random field whose precision matrix is specified by our knowledge of the solution.
If the probability density function of our posterior density function is not well known, we must sample from the density using methods such as Markov Chain Monte Carlo.
Using the stochastic approach, we obtain more information
(uncertainties of the random variable), but could possibly increase the expense of our algorithms.
Nonlinear inverse problems
Linear inverse problems are all pretty much the same, this is absolutely not true for nonlinear inverse problems.
We need very specially tailored techniques and algorithms in order to solve nonlinear inverse problems (unless we do least squares, which is very costly).
What you’ll need to know to study nonlinear inverse problems: PDE’s
Analysis (specifically functional analysis) Numerical analysis
Probability and statistics
Knowledge of the physics underlying the situation
Nonlinear inverse problems
Linear inverse problems are all pretty much the same, this is absolutely not true for nonlinear inverse problems.
We need very specially tailored techniques and algorithms in order to solve nonlinear inverse problems (unless we do least squares, which is very costly).
What you’ll need to know to study nonlinear inverse problems: PDE’s
Analysis (specifically functional analysis) Numerical analysis
Probability and statistics
Knowledge of the physics underlying the situation
Nonlinear inverse problems
Linear inverse problems are all pretty much the same, this is absolutely not true for nonlinear inverse problems.
We need very specially tailored techniques and algorithms in order to solve nonlinear inverse problems (unless we do least squares, which is very costly).
What you’ll need to know to study nonlinear inverse problems:
PDE’s
Analysis (specifically functional analysis) Numerical analysis
Probability and statistics
Knowledge of the physics underlying the situation
Thanks for listening!
References
Jennifer Mueller and Samuli Siltanen, Linear and Nonlinear Inverse Problems with Practical Applications. SIAM. 2012.
Johnathan Bardsley, Computational Uncertainty Quantification for Inverse Problems. SIAM. 2018.