arxiv: v1 [math.na] 18 Aug 2021

(1)

Alternating Minimization for Computed Tomography with Unknown Geometry Parameters.

Ana Castillo Proximity Learning [email protected]

Mai Phuong Pham Huynh Emory University [email protected]

Manuel Santana Utah State University [email protected]

Advisor:

James Nagy [email protected] Emory University

Abstract

Due to COVID-19 pandemic, there is an increasing demand for portable CT machines worldwide in order to diagnose patients in a variety of settings [16]. This has lead to a need for CT image reconstruction algorithms which can produce high quality images in the case when multiple types of geometry parameters have been perturbed. In this paper we present an alternating descent algorithm to address this issue, where one step minimizes a regularized linear least squares problem, and the other minimizes a bounded non-linear least square problem. Additionally, we survey existing methods to accelerate the convergence algorithm and discuss implementation details through the use of MATLAB packages such as IRtools and imfil. Finally, numerical experiments are conducted to show the effectiveness of our algorithm.

1 Introduction

In medical imaging, computed tomography (CT) techniques are becoming increasingly popular for their ability to produce high quality images of the human body. These help doctors diagnose several types of cancers and most recently handle COVID-19 cases. Computerized tomography (CT) scans have allowed radiologists to pinpoint infection in patients with COVID-19 and rule out any diseases caused in patients recovered from COVID-19. A CT scanner is a device that is composed of a scanning gantry, x-ray generator, computer system, console panel and a physician’s viewing console. The scanning gantry is the part that produces and detects x-rays. In a typical CT scan, a patient will lay on a bed that will move through the gantry. An X-ray tube rotates around the patient and shoots X-ray beams through the human body at different angles. These X-ray measurements are then processed on a computer using mathematical algorithms to create tomographic (cross- sectional) images of the tissues inside the body. Limitations arise when using CT scanners for these medical procedures since these devices require extensive care, such as regular maintenance. Plus, transporting these to remote locations is not an easy task. Point-of-care imaging addresses these challenges by allowing radiologists to add portable CT scanners to their departments to increase patient satisfaction and improve medical outcomes.

However, the parameters that are associated with the geometry of these devices cannot always be precisely

arXiv:2109.01481v1 [math.NA] 18 Aug 2021

(2)

calibrated in point-of-care situations. These parameters may somewhat change over time, when the scanner is adjusted during the image acquisition process or in transportation.

Figure 1: Object-Source-Detector

Figure 1 shows how an x-ray source rotates around an object during a typical CT acquisition process. The location of the source is measured by a view angle, or location of the source on a circle around the object, and distance from the source to the center of the object. If the location of the x-ray source has been perturbed the reconstructed image will be of very poor quality. The case where only the view angles have been perturbed is an active area of research [13] [14]. In [6] Ding devises the numerical scheme we study, and uses it to study both the unknown angles case as well as the unknown distance from the source to the object. The purpose of this paper is to study reconstruction methods in the case when both view angles and distance from the source to the object have been perturbed. Ding performed one small experiment of this type, and we build upon his work by testing larger images, larger perturbations, more acceleration techniques, more linear solvers, parallelizing the code, and adding to the IRtools [8] package to include this algorithm.

The rest of the paper will proceed as follows. In section 2 we give a brief background to the CT problem, present the algorithm, and discuss some theory behind solving the algorithm. Next in section 3 we provide an alternate form of the algorithm as a fixed point iteration and discuss acceleration techniques to improve convergence. Section 4 outlines considerations for implementation are given, as well as an example of using the MATLAB package that accompanies this paper. Numerical experiments and results are outlined in section 5.

Finally future directions are presented at the end.

2 Block Coordinate Descent

In this section, we discuss the mathematical concepts and modeling of the problem as well as different techniques involved to effectively solve the inverse, ill-posed problem.

(3)

2.1 Problem Set-Up

To setup the appropriate linear least square problem, we first need to establish a few notations.

R: distance from the center of the object to the x-ray source θ: x-ray source angle of rotation

p =





 R1

... R_m

θ₁ ... θ_m







b: vector of measured data from the machine’s detector x: vector of the reconstructed image

Using Beer’s Law [7], we obtain b and A(p), a matrix created as a function of p. This allows us to solve for the image vector x in the form of the linear algebra problem A(p)x = b. Here b is called the sinogram, and it is a vector containing data from the machine’s detector, and A(p) is a large, sparse, and ill-conditioned matrix.

We now examine closely how matrix A(p) and vector p are generated. In order to achieve these, we first study Beer’s Law. Denote

• d is the length that the X-ray travels through the object

• µ is the material dependent attenuation coefficient (loss of energy coefficient, depending on object’s material). Here, µ is the unknown, and represents the image.

• I0 is the energy entering the object

• I is the energy received at the detector. We would expect I0 > I as there will be a loss amount that depends on d and µ

Figure 2: One X-ray and one-material object illustration

Beer’s Law states that

I = I0e^−µd. (2.1.1)

From equation 2.1.1, we can deduce that

− log I I0

= µd.

(4)

Hence, the vector b mentioned above can be defined as

b =







− log I1

I0₁

...

− log I_i I0_i

...

− log I_m I0_m







each row of which contains the information about I and I₀ received from exactly one X-ray. In other words, the row i-th contains the information obtained from the i-th X-ray, corresponding to its Ii and I0_i data. Here, m is the number of X-rays.

In reality, an object is made of various different materials, µ, and each X-ray passes through the materials in different lengths, di, as illustrated in the following picture.

Figure 3: One X-ray with multiple-material object illustration

Consider that each object can be divided into multiple pixels. The pixels can be divided by any shapes;

however, for simplicity, the picture below divides the object into multiple square pixels. Additionally, though there are parallel and fan beam X-ray sources, throughout this project, we consider only the fan-beam X-rays sources.

Figure 4: Fan-beam X-rays

(5)

Now, Beer’s Law can be stated more specifically as

Ii= I0_ie⁻^Pⁿ^j=1^µ^j^d^ij ⇒ − log Ii

I0_i

=

n

X

j=1

µjdij

⇒ bi=

n

X

j=1

µjdij

where the size of the pixel grid is n × n.

Therefore, we have the Ax = b problem with vector x unknown, matrix A and vector b known. The problem is explicitly stated as followed





 b1

b2

... bm







| {z }

b

=







d11 d12 · · · d1n

d21 d22 · · · d2n

... ... . .. ... dm1 dm2 · · · dmn







| {z }

A





 µ1

µ2

... µn







| {z }

x

(2.1.2)

Here, the matrix A is made of information obtained from the distances that each beam travels through each pixel. That distance depends on the firing angle and distance of the x-ray source to the object, what we have established as R and θ. Hence, matrix A is denoted as A(p).

In a computed tomography problem with known geometry parameters, equation (2.1.2) leads to the need to solve a regularized linear least squares problem due to the ill-conditioning of matrix A. In this paper, the problem we seek to study with unknown geometry parameters can be stated as

x = arg min

x,p

||A(p)x − b||²₂+ α²||x||²₂. (2.1.3)

Here α is a regularization parameter. The solution to this problem can be approximated by using an alternating descent scheme known as block coordinate descent or BCD, which we describe in algorithm 1.

Algorithm 1: Tomography Block Coordinate Descent

1 Input p0∈ R^m

2 Output xk

3 for k = 0, 1, 2, ... do

4 xk+1= arg min ||A(pk)x − b||²₂+ α²||x||²₂

5 pk+1= arg min ||A(p)xk+1− b||²₂

Thus equation (2.1.3) can be approximated by an alternating minimization scheme where one iteration involves solving a large linear least squares problem, and a large non-linear least squares problem. In this section we discuss methods to solve both problems.

Another thing to consider is that in reality, it is impossible to obtain the exact data from the machine due to the variation in machine’s inconsistent quality and patients’ stability. Therefore the measured data contains noise. The problem we seek to solve is then

A(p)x = btrue+ η. (2.1.4)

where btrue is the exact data in the perfect scenario and η is noise attached to the information we obtained from the machine. The problem can now be rewritten as follows

Aˆx = b. (2.1.5)

Here we have assumed A=A(p) is exactly known.

(6)

Assume that matrix A is invertible, from equation 2.1.5, we can solve for ˆx as ˆ

x = A⁻¹b

= A⁻¹(Ax + η)

= x + A⁻¹η

| {z } error .

This shows that the quality of the reconstructed images is effected greatly due to errors caused by A⁻¹η. As previously stated, there are numerous unavoidable reasons leading to a noisy data set from the detector, Hence, regularization is needed as it can help make the error in our result smaller.

2.2 Filter-based Regularization

2.2.1 Singular Value Decomposition (SVD)

Before looking more closely at the idea behind Tikhonov Regularization, it is necessary to get familiar with the Singular Value Decomposition algorithm (SVD). An SVD of a matrix A is defined as

A = U ΣV^T (2.2.1)

in which

• U and V are orthogonal matrices, that is:

U^TU = I and V^TV = I Denote U and V as

U =u1 u2 · · · un V =v₁ v2 · · · vn

where u_j and v_j are the jth columns of U and V respectively.

• Σ is a diagonal matrix of singular values σi’s with 0 ≤ σn≤ · · · ≤ σ1

Σ =





 σ₁

σ₂ . ..

σ_n







⇒ Σ⁻¹= diag 1 σ1

, 1 σ2

, · · · , 1 σn

Assume that A is invertible, it can then be deduced from equation 2.2.1 that A⁻¹= (U ΣV^T)⁻¹

= (V^T)⁻¹Σ⁻¹U⁻¹

= V Σ⁻¹U^T

(7)

If we continue to consider our “naive” inverse problem above, that is ˆx = A⁻¹b, by substituting the SVD form of the inverse of matrix A, A⁻¹, and do some matrix multiplications, we obtain the following

ˆ

x = A⁻¹b = V Σ⁻¹U^Tb

=v₁ v₂ · · · v_n





 1 σ₁

1 σ2

. .. 1 σn











 u^T₁ u^T₂ ... u^T_n





 b

=v1 v2 · · · vn





 1 σ1

1 σ2

. .. 1 σ_n











 u^T₁b u^T₂b ... u^T_nb







=v1 v2 · · · vn





 u^T₁b

σ₁ u^T₂b

σ2

... u^T_nb

σn







=u^T₁b

σ₁ v1+u^T₂b

σ₂ v2+ · · · +u^T_nb σ_n vn

That is ˆx =

n

X

j=1

u^T_jb σj

v_j , which is a linear combination of the columns of V

However, as we mentioned earlier, the data we receive from the machine contains noise, so the ˆx we actually achieve can be rewritten as

ˆ x =

n

X

j=1

u^T_jb σj

v_j

=

n

X

j=1

u^T_jb_true σ_j v_j+

n

X

j=1

u^T_jη σ_j v_j

| {z }

error .

Now, from the equation above, we notice that in order to achieve a good reconstructed image with small error, we would want to consider large singular values to minimize the second term of the equation, which represents the error term. Moreover, we would also want to avoid constructing bad SVD components (i.e dividing by tiny singular values), which means that we want to balance out the magnitude of the singular values that we want to use. That is the main idea and motivation for using Truncated Singular Value Decomposition method (TSVD). This is stated as

xT SV D=

k

X

j=1

u^T_jb σ_j vj

where we will cut off the original SVD from point k + 1 to n. However, one question that remains is where and how we know the point to truncate; in other words, we want to find k. Notice that in this experiment, we are provided the true solution, which is not the case for realistic applications. Hence, it will be quite challenging to know the right spot to truncate. Additionally, the SVD is too computationally expensive to compute for a large matrix, and so alternative methods are needed.

(8)

2.3 Tikhonov Regularization

The TSVD method can be generalized as follows xT SV D =

n

X

i=1

φi

uib σi

vi where φi≈1 for large σi

0 for small σi

In this project, we will use Tikhonov regularization, which can be written as φi = σ_i²

σ²_i + α²

where α is the Tikhonov regularization parameter and σn ≤ · · · ≤ α ≤ · · · ≤ σ1. Now, we need to find the regularization parameter α instead of k. Notice, if we pick the regularization parameter α to be 0, then we will have φi= 1, which means there will be no regularization. If we pick α to be a small number and the singular values σ_i are very small, then the term σ_i²+ α² will dominate; in other words, φ_i≈ 0.

Tikhonov Regularization can also be written in the following form minx {||Ax − b||²₂+ α²||x||²₂}.

Equivalently, it can be rewritten in another form, which is easily verified using the normal equations minx

A αI

x −b

0

2

.

Substituting the SVD form of matrix A, A = U ΣV^T, to the above equation, we can deduce that the solution of the minimization problem is

x_{T SV D}= x_tik=

n

X

i=1

σ²_i σ_i²+ α²

u^T_i b σi

v_i.

Recall that in Section 2.1, we state that x = A⁻¹b. Yet, this will never be the case for our problem as matrix A is ill-posed and the problem is over-determined. However, we could still write it like a Moore–Penrose pseudo inverse form.

Denote A^† to be the pseudo inverse of A, we can now write xtik as x_tik= A^†b = V Σ^†U^Tb where, for Tikhonov regularization,

Σ^†= diag

σi

σ_i²+ α²

.

2.3.1 Regularization parameter with known true solution

Assume that we know the true solution x_true. We want to find the regularization parameter α that can minimize the function

E_k= ||x_tik− x_true||²₂

= ||V^Tx_tik− V^Tx_true||²₂

(norm is unaffected by orthogonal transformations)

= ||V^TV Σ^†U^Tb − V^Txtrue||²₂

= ||Σ^†U^Tb − V^Txtrue||²₂.

Denote ˆb = U^Tb and ˆx_true= V^Tx_true, then the function E_k is rewritten as Ek = ||Σ^†b − Vˆ ^Txˆtrue||²₂

=

k

X

i=1

σ_i² σ²_i + α²

ˆbi− ˆxi

² .

(9)

As our goal is to find α that can minimize function Ek, we notice that now, it is just a one-variable optimization problem. In other words, we want to find α such that it minimizes E_k(α), which in pratice is solved with a function such as the MATLAB function fminbnd, which uses a golden section search and successive parabolic interpolation.

2.3.2 Generalized Cross Validation (GCV)

As mentioned earlier, in reality, we do not have the true solution beforehand to set up a perfect optimization problem as in Section 2.3.1 or to test the accuracy of our model. Hence, we need to use techniques in order to learn information about the noise in the achieved data. There are many techniques that can be chosen such as Discrete Picard Condition, L-Curve, Discrepancy Principle, and GCV. However, in this paper, we will focus only on the GCV.

Before studying the general problem setup of GCV, we need to familiarize with some notations. First notation is a residual, measuring the difference between Ax_tik and b.

||b − Axtik||²₂= ||b − (U ΣV^T)(V Σ^†U^Tb)||²₂

= ||U^Tb − U^TU ΣV^TV Σ^†U^Tb||²₂

(norm is unaffected by orthogonal transformations)

= ||(I − ΣΣ^†)U^Tb||²₂

Second notation is the trace of the matrix I − AA^†. For a reminder, trace of a matrix is defined as the sum of all the elements on the matrix’s main diagonal.

trace(I − AA^†) = trace(U U^T − (U ΣV^T)(V Σ^†U^T))

= trace(U (I − ΣΣ^†)U^T)

= trace(I − ΣΣ^†).

(trace is unffected by orthogonal transformations) The general problem of GCV is

G(·) = n||b − Axtik||²₂

trace(I − AA^†). (2.3.1)

For the case of Tikhonov assuming that our matrix A is n × n we have the residual term as follows.

||b − Axtik||²₂= ||(I − ΣΣ^†)U^Tb||²₂

=

n

X

i=1

α²u^T_ib σ²_i + α²

,

And

trace(I − AA^†) = trace(I − ΣΣ^†)

=

n

X

i=1

α² σ_i²+ α².

Hence, our problem can now be finalized as finding the Tikhonov regularization parameter α to minimize

G(α) = n

n

X

i=1

α²u^T_ib σ_i²+ α²

n

X

i=1

α² σ²_i + α²

= n

n

X

i=1

u^T_ib σ_i²+ α²

n

X

i=1

1 σ_i²+ α²

.

(10)

In other words, with σn ≤ · · · ≤ α ≤ · · · ≤ σ1,

α = arg min

α

G(α) = arg min

α

n

X

i=1

u^T_i b σ_i²+ α²

n

X

i=1

1 σ²_i + α²

.

As G(α) is a continuous, one unknown variable function, we can easily solve this optimization problem using MATLAB’s built-in function fminbnd.

3 Acceleration Techniques

In this section we discuss techniques to accelerate the convergence of BCD algorithm. The key is to recast problem of estimating the geometry parameters fixed point iteration technique as follows

Algorithm 2: Fixed Point Tomography Reconstruction

1 Input x0∈ Rⁿ

2 Output x_k

3 for k = 0, 1, 2, ... do

4 p_k+1= arg min ||A(p)x_k+1− b||²₂

5 x_k+1= arg min ||A(p_k)x − b||²₂+ α²||x||²₂

Then we can define g(x_k) = x_k+1, and algorithm 1 becomes that of finding a fixed point of g. This is useful as acceleration techniques for fixed point problems have been widely studied [17]. It should be noted that the problem can also be stated as g(ωk) = ωk+1 where ωk := (xk, pk), but numerically this performed worse as shown in subsection 5.3.

In our numerical experiments we used three fixed point acceleration schemes, the remainder of the section will be devoted to discussing them. For ease we will denote g(x) as the function we are trying to find the fixed point of for the rest of this section if the function is a single variable, and G(x) if it is a function of several variables.

We motivate the first two acceleration methods starting at the single variable fixed point problem g(xk) = x_k+1. To begin, we let ∆ denote the difference operator, that is let ∆x_k := g(x_k) − x_k, ∆g(x_k) := g(g(x_k)) − g(xk), and ∆²xk := ∆g(xk) − ∆xk. Aitken’s ∆² method [1] along with its recursive application, the second order Steffensen method remains a popular choice for solving such problems [15]. This method is derived from approximating the multiplying constant for a linearly converging sequence, and can be stated as

xk+1= xk− (g(x_n) − x_n)² g(g(xk)) − 2g(xk) + xk

= xk−(∆x_k)²

∆²xk

. Or equivalently

xk+1= g(g(xk)) −(∆g(x_k))²

∆²xk

. (3.0.1)

Vector generalizations of this algorithm have been extensively studied, see [17] for several references. Most come from defining the inverse of a vector x as

x⁻¹ := x 1

||x||²₂.

Application of this definition to (3.0.1) leads to the Irons-Tuck method [9]

x_k+1= G(G(x_k)) −∆G(xk) · ∆²xk

||∆²x_k||² ∆G(x_k).

Where · denotes the dot product and the ∆ operator is defined as in the scalar case. Numerically the Irons- Tuck method has been shown to out perform many other vector generalizations of the ∆² process [12], but is

(11)

computationally expensive when G(xk) is expensive to evaluate. One alternative is to use the crossed secant method

xk+1= G(xk) −(G(xk) − G(xk−1) · (∆xk− ∆xk−1)

||(∆xk− ∆xk−1)||²₂ ∆xk.

It has the advantage of only needing one evaluation of G per iteration, and in many numerical tests performs similar to the Irons-Tuck method [17]. In fact if each iteration the crossed secant method is alternated with a standard fixed point iteration the resulting method is the Irons-Tuck method.

Another alternative is Anderson acceleration sometimes called Anderson mixing. The general idea is to use information from a predetermined number of residuals. Here we only state a more easily implementable version of the algorithm. For a more general algorithm see [20]. Let r be the number of residuals to use, and denote

X_k = (∆x_k−r, · · · , ∆x_k−1).

Then we state the algorithm

Algorithm 3: Anderson Acceleration

1 Input x₀∈ Rⁿ, r ≥ 1

2 Output x_k

3 Compute G(x₀) = x₀ for k = 1, 2, ... do

4 rk = min r, k

5 γ^(k)= arg min ||∆x_k− X_kγ||₂

6 xk+1= G(xk) −Pr_k

i=1γ_i^(k)[(G(xk−r_k+i+1) − G(xk−r_k+i)]

Implementation details can be found in [19]. It will suffice us to say that the linear least squares problem can be efficiently solved by updating a QR factorization of X after every iteration. Additionally we implemented a hyper parameter droptol which removes a column of X if the condition number of R gets above droptol. In section 5.3 we compare the use of these three algorithms in our implementation.

4 Implementation

In this section we discuss practical considerations for the implementation of algorithm 1 and show an example of using the accompanying code.

Recall that each time the x-ray source fires it is associated to one angle and one R parameter. Therefore A is created as

A(p) =







A(θ₁, R₁) ... A(θn, Rn)





.

The non-linear least squares problem can then be solved by solving subproblems of the form min ||A(θi, Ri)xk− bi||²₂

where each biis the corresponding part of the b vector. This is advantageous as it allows for easy parallelization of algorithm 1 leading to significant speed up, and a comparison is made in subsection 5.2.

To simulate the problem we will use the IRTools package [8]. Additionally IRTools provides several state of the art regularized linear least squares solvers which we will use in our implementation of algorithm 1. We will compare three linear least squares solvers available in the IRtools package which use different forms of regularization. The first is the hybrid-LSQR algorithm [10], which regularizes by added a two-norm penalty term. The next is the iteratively re-weighted norm approach or IRN [18], which use a one-norm regularization.

Finally the fast iterative shrinkage threshold algorithm or FISTA [2] is used which has no regularization term, but constrains the solution vector x such that x ≥ 0. A comparison of these methods is given in subsection 5.5.

(12)

To approximate the solution of the non-linear least squares problem we will compare two methods. The first is a Trust-Region-Reflective algorithm studied in [4],[5] and implemented in the MATLAB function lsqnonlin.

The second is the method of implicit filtering [11] which uses a projected quasi-Newton iteration using difference gradients. Both methods are for bounded problems with unknown derivative information, making them suitable for our problem, as the geometry parameters can be realistically bounded. These two methods will be compared in section 5.1.

Our implementation of algorithm 1 was built on top of the MATLAB IRtools package and naming conventions were chosen to mirror it. In the base IRtools package the function PRset is used to set up options for computed tomography problems as follows

options = PRset(options, ‘field name1’,field value1, ‘field name2’,field value2).

We have updated PRset to accept values relating to BCD, such as an initial guess for the parameters. When a field name is not passed in default values are selected. Then to simulate a CT problem with unknown geometry parameters PRtomo var is used with the image size being n × n

[b,ProbInfo] = PRtomo var(n, options).

Above b is the right hand side vector with noise added, and ProbInfo contains the initial guess for the parameters. Thus, PRtomo var generates all the data necessary to simulate the inverse problem. Next the IRset function is used to set up hyper-parameters for the solvers in IRtools

options = IRset(options, ‘field name1’,field value1, ‘field name2’,field value2).

Again IRset was updated to include parameters for BCD, such as which acceleration technique to use. Finally a new function IRbcd computes the BCD

[x,iterInfo] = IRbcd(b,iterOptions, probInfo).

In the output x is the image vector after BCD terminates, and iterInfo contains information about errors at each iteration. Now we proceed to demonstrate an example of simulating and solving a CT reconstruction experiment with this code. To begin we generate the data

n = 64;

ProbOptions = PRset(‘Rpert’,0.25,‘anglespert’,0.25);

[b,probInfo] = PRtomo var(n,ProbOptions);

This generates a test problem where perturbations are samplings of a uniform distribution on the interval [−0.125, 0.125] are added to the R values and the angles. Henceforth We will use the abbreviation U (a, b) for a uniform distribution. For ease of computation the same perturbation is added to every fourth of the angles and R values. The image for this problem is the Shepp-Logan phantom. Figure 5 (a) represents the true solution, (b) is the solution with noise in the sinogram b, but true parameters, and (c) is the solution with the initial guess parameters.

(13)

(a) True Solution (b) True Parameters Solution (c) Initial Parameters Solution

Figure 5: Example Problem

The following codes sets up the iteration options and runs the problem

iterOptions= IRset(‘nonlinSolver’,‘imfil’,‘accel’,‘anderson’,‘BCDmaxIter’,10,...

‘Rbounds’,0.1250,‘angleBounds’,0.1250);

[x,iterInfo] = IRbcd(b,iterOptions,probInfo);

After solving the problem we see the solution and the relaltive error graph in figure 6. This shows after the BCD algorithm we get a comparable solution to when the true geometry parameters are known.

Image after BCD

Figure 6: Example Problem Solution

The images in all five figures shown are generated with the following MATLAB code PRshowbcd(iterInfo,probInfo)

Now we move to the section highlighting several numerical experiments, all of which can be easily repeated using the code explained in this section.

5 Numerical Results

In this section we highlight several experiments for the tomography problems. In each problem an initial guess for the R Parameters and Angle parameters were given. Then a perturbation generated uniformly was added

(14)

to each angle and R value, and the BCD was run to produce a better image. For all tests Gaussian noise was added to the sinogram b so that the new vector b_noise= b + η, where ||η||₂/||b||₂is equal to a specified noise level, in this case 0.01. The initial guess for the view angles was always 0 : 2 : 358, and the initial guess for the R parameters was 2. Unless otherwise stated a random perturbation was added to each view angle and corresponding R parameter. In the graphs below the angle and R parameter error represent the relative error between the true perturbations and the current iterates approximation of those perturbations. In each case an initial guess of 0 was given for all perturbations. The images used were the Shepp-Logan phantom, and the image spine.tiff from the MATLAB image-processing ToolBox, with each image size being 256 × 256 unless otherwise stated. Finally all tests were run on a Microway system that has four Intel Xeon E5-4627 CPUs with 40 cores and 1 TB of memory running Ubuntu Linux.

5.1 Imfil and lsqnonlin Test

We begin by comparing the MATLAB optimization toolbox function, lsqnonlin to imfil. For this comparison, we used the Shepp Logan image, 15 iterations, and for the linear solver IRhybrid lsqr. Table 1 summarizes the perturbation range used for R and θ and the x-Norm errors when using imfil and lsqnonlin.

R Pertubation Range Angle Perturbation Range x-Norm Error imfil x-Norm Error lsqnonlin

[-0.00005,0.00005] [-0.00005,0.00005] 0.16 0.16

[-0.0005,0.0005] [-0.0005,0.0005] 0.16 0.16

[-0.005, 0.005] [-0.005, 0.005] 0.17 0.18

[-0.05, 0.05] [-0.05, 0.05] 0.23 0.23

[-0.5, 0.5] [-0.75, 0.75] 0.50 0.52

[-0.55, 0.55] [-0.8,0.8] 0.52 0.59

[-0.6, 0.6] [-0.85,0.85] 0.55 0.66

[-1, 1] [-1.5,1.5] 0.67 0.76

Table 1: x-Norm Errors imfil and lsqnonlin

As it can be seen in figure 7, for small perturbations, imfil and lsqnonlin produce similar quality reconstructions. The x-Norm Errors for imfil and lsqnonlin are very similar. However, for large perturbations, the x-Norm errors change significantly. In this case, imfil does better than lsqnonlin. Thus for all other tests we will use imfil.

Figure 7: Plot of Table 1

5.2 Parallel Speed Up

In this section we demonstrate the effectiveness of the parallelization discussed in section 4. Table 2 shows a 20 iteration run of the same simulation using different image sizes where the image size is n × n. In all the timing tests the Shepp-Logan phantom was used with perturbations as realizations of the distribution U (−0.25, 0.25)

(15)

added to each angle and R value. The hybrid-LSQR algorithm was used as the linear least squares solver. The aforementioned computer was used with twelve workers.

n 32 64 128 256 512

Serial 257 714 1308 4238 13771 Parallel 35 166 289 1026 3810

Table 2: Run Times in Seconds

From the results in table 2 we see drastic speedup in all cases with the parallelization. For larger problems solving the linear least squares problem appears to take most of the time, since A is a large matrix. For example when n is 256 the size of A is 65160 × 65536 and when n is 512 the size of A is 131400 × 266256. Regardless the parallel implementation made possible by the problem model provides significant speedup.

5.3 Acceleration Test

The next test we ran was to compare the different acceleration techniques. Perturbations were added as realizations of U (−0.5, 0.5). In the legends in figure 8, BCD is for no acceleration, CS for crossed secant, AA for Anderson Acceleration, and IT for the Irons-Tuck method. To give reference to table 3 the tests were run on 12 local workers.

Figure 8: Graphs Comparing Acceleration Techniques

BCD CS AA IT

1004 1042 1095 2375 Table 3: Run Times in Seconds

From these tests we see that the acceleration techniques provided slightly better convergence, with the crossed secant method and Anderson Acceleration performing the best. The Irons-Tuck method converged much better in the angler parameters, but took much longer. This was expected as the function evaluation is quite expensive, and as previously noted Irons-Tuck requires an extra function evaluation at each iteration.

Figure 9 shows the true image, and the image after the parameter optimization. Despite having slightly larger image error, the Irons-Tuck image seems to have the least background noise.

(16)

Figure 9: Spine Images after Parameter Optimization

As mentioned in these tests we solved for the fixed point where the fixed point is the image vector. The table below compares the case when the fixed point vector is defined to be ω := (x, p), with the same parameters as before.

Acceleration Scheme Fixed Point Type Angle Error R Error Image Error Anderson

x 0.557 0.111 0.189

ω 14.642 14.020 0.190

Crossed Secant

x 0.614 0.112 0.189

ω 15.251 14.079 0.1774

Irons-Tuck

x 0.359 0.108 0.199

ω 14.009 14.009 0.195

Table 4: Comparison of Fixed Point Problems

As can be seen from table 4 when the ω fixed point scheme is used the image error remains approximately the same, while the error in the angle and R parameters grows large. Thus in our scheme we choose to only look for a fixed point of x. The conclusion of this experiment overall is that when an accelerated fixed point scheme on x is used it may not produce faster convergence, but it can produce better convergence in a similar amount of time.

5.4 Stopping Criterion

Throughout the project, we have been generating the results by running all 20 BCD iterations. However, from figure 8, we observe that the relative norm errors of image, R, and the angle tend to start converging at around iteration 7 to 10. That motivates us to choose a logical stopping criterion, which can help reduce the run time as well as computer’s memory space. However, the task is challenging as the stopping criterion depends on many factors such as the size of the problem, the number of times angles and R are changed, and the techniques we used to solve the optimization problem.

(17)

Different from other experiments’ perturbations used in this section, here, perturbations are sampled from

∼ U (−0.25, 0.25) with other factors such as R and view angle initial guesses, image size (256 × 256) held the same. Even though we are yet to find a good stopping criterion for this project, we are able to find a relatively good tolerance of 3e-2 for our set-up. The tolerance is measured as followed

||xk− xk−1||2

||xk−1||2

≤ 0.03. (5.4.1)

From this naive approach, we are able to achieve the following results with different acceleration techniques.

(a) Without Stopping Criterion (b) With stopping criterion (c) Relative Error Norm

Figure 10: Stopping Criterion on Standard BCD

We observe that for standard BCD, our naive stopping criterion seems to work quite well and it ceases the code at its 7-th iteration of the BCD loop. The resulted reconstructed image (Figure 10b) is no difference, if not even better, compared to the one without stopping criterion (Figure 10a). Furthermore, in figure 10c, the relative norm error graph shows convergence in all considered error norm, which suggests the success of our chosen stopping criterion.

Figure 11: Stopping Criterion on Anderson Acceleration

The stopping criterion on Anderson Acceleration also shows the relatively similar behavior as on standard BCD with a slightly earlier halt at the 6-th iteration.

(18)

Figure 12: Stopping Criterion on the Crossed Secant Method

Similar to the two previous techniques, stopping criterion on Iron-Tucks, shown in figure 12, halts the code at the 6th iteration.

Figure 13: Stopping Criterion on the Crossed Secant Method

Different from the stopping criterion on standard BCD, Anderson, and Irons-Tuck, our “naive” approach does not seem to behave well on the crossed secant method. Here, with the tolerance shown in equation 5.4.1, the code runs all the designated iterations of 20. However, in figure 13c, we observe that the relative norm error started to converge at around iteration 8 to 10, which suggested that we should implement a larger stopping tolerance for this algorithm.

In conclusion, it is noteworthy that standard BCD, Anderson, and crossed secant seem to have a relatively similar stopping criterion while crossed secant may need a larger one. Additionally, as shown in figure 10(c) to 13(c), even though the relative error norm of x and R have already converged, the corresponding norm of p and the angles θ show a decreasing trend. However, our approach to this problem is naive as we have only experimented with different stopping criterion but have yet accounted the relationship between stopping criterion and problem size, number of times the parameters of R and angles changed, etc. Hence, for the next step, it is crucial to examine more closely the behavior of the stopping criterion in order to come up with better heuristics for choosing a stopping criterion based on the problem.

5.5 Comparison of Linear Least Squares Solvers

In this section we compare the performance of three linear least squares solvers available in the IRtools package using hybrid LSQR, FISTA, IRN. We did two tests using the Shepp-Logan phantom. Perturbations were chosen from realizations of U (−0.5, 0.5).

(19)

Figure 14: Graphs Comparing Linear Least Squares Solvers

In figure 14 we see that IRN and Fista perform similarly, and drastically outperform the hyrbid-LSQR method. Interestingly all three methods perform poorly on the angle parameter estimation. Despite having similar image error norms, figure 15 shows that visually FISTA appears better than IRN.

(a) True Solution (b) LSQR (c) IRN (d) FISTA

Figure 15: Images for Linear Least Squares Comparison

The second test comparing the linear least squares solvers uses the same parameters as the test before it, except the perturbations are samples of U (−1, 1). This is a significant amount of perturbation in this case as it is up to half the original value of R and half the distance between each angle. Figure 16 shows the relative errors.

Figure 16: Graphs Comparing Linear Least Squares Solvers For Second Test Problem

Interestingly with a high perturbation level FISTA does the worst, while IRN continues to significantly outperform the LSQR algorithm. Unsurprisingly all three algorithms maintain a high relative error for the angles when larger perturbations are added. Figure 17 shows the reconstructed image, and while no image is a

(20)

high quality reconstruction, it is clear IRN is the cleanest. The two tests show though that there is certainly an image and perturbation level dependence as to which method has the best reconstructions.

(a) True Solution (b) LSQR (c) IRN (d) FISTA

Figure 17: Images for Linear Least Squares Comparison for the Second Test Problem

6 Conclusion and Outlook

We have devised an algorithm to effectively estimate the unknown geometry parameters of an uncalibrated portable CT machine. We have surveyed and tested up-to-date methods to solve Tikhonov regularization and non-linear least squares problems, and demonstrated their effectiveness in solving the computed tomography problem with unknown geometry parameters.

In the future, we hope to investigate more about why the acceleration techniques work better only with the image vector. Additionally, we hope to work on better stopping criterion. Specifically, we want to find a good relationship between the stopping criterion and other factors (image size, number of time R and the view angles changed, etc.) so that the algorithm is more flexible to different optimization-solver and acceleration techniques. Finally, we will look into other acceleration techniques for the optimization problem.

We also hope to apply our algorithm to other medical imaging applications where geometry parameters may be unknown such as bedside tomosynthesis [3].

7 Acknowledgements

We would like to express our deepest appreciation to our mentor, Dr. James Nagy, for working with each one of us. Without his great support and guidance, this research project would not have been possible. Additionally, we would like to thank the faculty at the Department of Mathematics, Emory University for dedicating their time to answering our questions, facilitating seminars and discussions related to this project. Lastly, we would like to acknowledge the National Science Foundation for giving us this opportunity to be part of NSF 2021 REU/RET Computational Mathematics for Data Science. This work was funded under NSF grant number DMS-2051019.

References

[1] A. Aitken. On Bernoulli’s numerical solution of algebraic equations. Proceedings of the Royal Society of Edinburgh, 49:289–305, 1926.

[2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.

SIAM Journal of Imaging Sciences, 2(1):183–202, 2009.

(21)

[3] J. Cant, A. Snoeckx, G. Behiels, P. M. Parizel, and J. Sijbers. Can portable tomosynthesis improve the diagnostic value of bedside chest X-ray in the intensive care unit? A proof of concept study. European Radiology Experimental, 2017.

[4] T. F. Coleman and Y. Li. On the convergence of reflective Newton methods for large-scale nonlinear minimization subject to bounds. Mathematical Programming, 67(2):189–224, 1994.

[5] T. F. Coleman and Y. Li. An interior trust region approach for nonlinear minimization subject to bounds.

SIAM Journal on Optimization, 6(2):418–445, 1996.

[6] P. Ding. Accelerated alternating minimization for x-ray 2 tomographic reconstruction. arXiv:2108.01017.

[7] C. L. Epstein. Introduction to the Mathematics of Medical Imaging, Second Edition. SIAM, Philadelphia, PA, 2007.

[8] S. Gazzola, P. C. Hansen, and J. G. Nagy. IR Tools: A MATLAB package of iterative regularization methods and large-scale test problems. Numerical Algorithms, 81:773–811, 2019.

[9] B. M. Irons and R. C. Tuck. A version of the Aitken accelerator for computer iteration. International Journal for Numerical Methods in Engineering, 1:275–277, 1969.

[10] D. M. O’Leary J. Chung, J. G. Nagy. A weighted-GCV method for Lanczos-hybrid regularization. Electronic Transmissions on Numerical Analysis, 28:149–167, 2008.

[11] C. T. Kelley. Implicit Filtering. SIAM, Philadelphia, PA, 2011.

[12] A. J. Macleod. Acceleration of vector sequences by multi-dimensional δ² methods. Communications in Applied Numerical Methods, 2(4), 1986.

[13] Y. Dong N. A. B. Riis and P. C. Hansen. Computed tomography reconstruction with uncertain view angles by iteratively updated model discrepancy. Journal of Mathematical Imaging and Vision, 63:133–143, 2021.

[14] Y.Doung N. A. B. Riis. A new iterative method for CT reconstruction with uncertain view angles. In Scale Space and Variational Methods in Computer Vision, pages 156–167, Cham, 2019. Springer International Publishing.

[15] Y. Nievergelt. Aitken’s and Steffensen’s accelerations in several variables. Numerische Mathematik, 59:295–

310, 1991.

[16] FE Online. Big demand for imaging equipment! From Portable X-ray to CT scanners, AI devices can help track Coronavirus, Jul 2020.

[17] I. Rami`ere and T. Helfer. Iterative residual-based vector methods to accelerate fixed point iterations.

Computers & Mathematics with Applications, 70(9):2210–2226, 2015.

[18] P. Rodr´ıguez and B. Wohlberg. An efficient algorithm for sparse representations with `^pdata fidelity term.

In Proceedings of 4th IEEE Andean Technical Conference (ANDESCON), Cusco, Per´u, October 2008.

[19] H. F. Walker. Anderson acceleration, algorithms and implementations. https://users.wpi.edu/~walker/

Papers/anderson_accn_algs_imps.pdf, 2011. Accessed 07/14/2021.

[20] H. F. Walker and P. Ni. Anderson acceleration for fixed point iterations. SIAM Journal of Numerical Analysis, 49(4):1715–1735, 2011.