• No results found

Perspectives to develop MBIR methods for SAFRAN

Due to the presence in its aeronautical parts of very diffusing and absorbing materials such as metal, the projections acquired at SAFRAN are very noisy, leading to unsatisfactory reconstructions by maximum-likelihood techniques, which try to match the data too much. In addition, the designed industrial parts often present strong asymmetries, making the noise not uniformly distributed over all the projection angles. These strong asymmetries advocate for weighting the data differently according to the projection direction, which is a task difficultly handled by filtered backprojection methods.

Since SAFRAN designs and manufactures its parts, SAFRAN has the possibility to enhance reconstruction quality for NDT by taking benefit from the knowledge of its parts, in terms of structure as in terms of shape. For this purpose, developing MBIR methods for reconstructing parts made by SAFRAN appears as a natural way, since these methods, as seen in this chapter, have the ability to insert prior information on the inspected volume, as weights for the projections.

MBIR methods have hyperparameters θ which can be difficult to fix in practice. This is for instance the case of tradeoff parameter λ in sections 2.5 and 2.6. This parameter often needs to be tuned by repeated experiments. This can be very long for 3D NDT. In addition, the tuning procedure must be re-done each time the acquisition protocol is changed. As a consequence, the problem of finding optimal hyperparameters may arise repeatedly and hinder

the industrialization of proposed MBIR methods. For this reason, in this thesis, we aim at proposing MBIR methods which, starting from a prior model M, estimate both optimal volume f and hyperparameters θ, by joint maximization a posteriori (JMAP) :

♣ˆf, ˆθq ✏ arg max f,θ

p♣f, θ⑤g; Mq, (2.85)

where joint posterior of the unknowns ψ✏ ♣f, θq is given by Bayes’ rule :

p♣f, θ⑤g; Mq ✏ p♣g⑤f, θ; Mqp♣f⑤θ; Mqp♣θ⑤Mq p♣g⑤Mq . (2.86) M ❄ ✒✑ ✓✏ θ ❄ ✛ f ✒✑ ✓✏ ❄ ✒✑ ✓✏ g 1st level 2nd level 3rdlevel

Figure 2.6: General hierarchical model to classify reconstruction methods

Figure 2.6 summarizes the general hierarchical model derived from equation (2.86) : data g result from volume f through projection operation H, while volume f is given by prior model M and its hyperparameters θ. Depending on which stages of the hierarchical model are considered, reconstruction methods can be divided into three classes :

- first-level methods are maximum-likelihood techniques presented in sections 2.3 and 2.4, which aim to match theoretical and actual projections : ˆf ✏ arg maxf p♣g⑤f, θ; Mq,

- second-level methods add prior information but fix hyperparameters : ˆf ✏ arg maxf p♣f⑤g; θ, Mq,

- third-level methods optimize both the volume and the hyperparameters, leading to joint maximization a poste- riori :♣ˆf, ˆθq ✏ arg maxf,θ p♣f , θ⑤g; Mq.

Reconstruction methods at each level have their pros and cons. Compared to first-level methods, second-level and third-level reconstruction methods are robust to uncertainties in the projections and enhance reconstruction quality. Third-level reconstruction methods optimize the hyperparameters jointly with the volume. Nevertheless, optimizing θ prevents from using improper priors, i.e., priors for which the integral over RN is not finite. As a result, several improper strong sparsity-inducing priors, such as anisotropic and isotropic TV (2.41) and (2.40), or Potts model (2.42), cannot be considered in third-level reconstruction methods : for these priors, the hyperparameters have to be tuned empirically, which may be difficult. At last, first-level reconstruction methods such as filtered backprojection do not introduce bias in the estimation and are fast, while MBIR methods are known to be computationally burdensome due to repeated projection and backprojection operations. In order to alleviate this computational cost, MBIR methods have to be massively parallelizable on GPUs.

Chapter 3

Projection and backprojection operators

As we have seen in chapter 2, projection and backprojection operators H and HT are called repeatedly in all MBIR methods. Since they are very computationally-intensitive, this makes MBIR methods very slow compared to FBP methods. In order to alleviate their computational cost, it has become common during the last decade to parallelize existing projection and backprojection algorithms [Sid85, Jos82, DMB04, LFB10, NL15] on GPUs (Graphical Pro- cessor Units) [SK10]. One GPU can run thousands of threads simultaneously, so GPUs are very interesting devices for high parallel computing (HPC). The threads of the GPU execute the same code, called the kernel [SK10]. Hence, parallelizing H (respectively HT) on the GPU consists in transposing the projection (respectively the backprojection) algorithm into kernels. As a result, the projections of thousands of rays (respectively the backprojections in thousands of voxels) can be computed at the same time, leading to a significant acceleration of MBIR methods, compared to a CPU implementation [LFDMY17].

When parallelizing a pair of projector and backprojector (P/BP) on the GPU, it is necessary to make sure that the final result of the projection of one ray is written by only one thread. Similarly, the backprojection in one voxel has to be written by only one thread. If this is not the case, this generates writing conflicts between the threads, so the accumulation of threads’ contributions gives a completely random result [PSL14]. In order to avoid this, the accumulation can be done using atomic operations [SK10, Chapter 9], so we are sure that only one thread at a time brings its contribution to the accumulation. Nevertheless, this slows down the projection or backprojection computation.

Efficient models for projection and backprojection operations are a key aspect to ensure speed and accuracy of MBIR methods. Unfortunately, an efficient projector H does not necessarily implies that the adjoint backprojector HT is also efficient. For instance, Siddon’s projector, presented in appendix B.1, is fast [Sid85, JSDS 98, HLY99], but its adjoint backprojector not, even parallelized on the GPU [PSL14, NL15]. Similarly, while the ray-driven (RD) projector, presented in section 3.1.1, is very suited to high parallelization, this is not the case for its adjoint backprojector. The same problem occurs for the voxel-driven (VD) backprojector detailed in section 3.1.2, of which the adjoint projector is difficult to accelerate on the GPU [DYXW17].

Due to the difficulty to use adjoint projector and backprojector which are both fast, accurate and easily paralleliz- able on GPUs, it has become very common in the CT community to work with P/BP pairs which are unmatched, i.e. pairs in which the used backprojector is not the adjoint of the used projector [ZG00]. For instance, in section 3.1, we present the unmatched ray-driven/voxel-driven (RD/VD) pair, in which the projector is ray-driven (RD) and the backprojector is voxel-driven (VD). Due to the computational efficiency of both the projector and the backprojector, this pair implemented on the GPU enables to perform very fast projection and backprojection operations.

Using an unmatched pair can be valid for very simple reconstruction algorithms, such as gradient descent to solve unregularized least-squares [ZG00]. Nevertheless, it remains a mathematical approximation, since the convergence proofs of reconstruction algorithms are derived considering a matched pair of projector and backprojector. As a consequence, an unmatched P/BP pair may lead to suboptimal reconstruction and even hinder the convergence of complex algorithms such as ADMM or PWLS [ASM16]. In order to ensure the convergence of MBIR methods, computationally-efficient matched P/BP pairs have been proposed [DMB04, LFB10]. In section 3.2, we focus on the matched Separable Footprint (SF) pair [LFB10] and present a new GPU implementation. The RD/VD and the SF pairs are validated and compared in section 3.3, as single modules and in a full iterative reconstruction method

which is PDFW presented in section 2.6. Perspectives for the new GPU implementation of the SF pair are presented in section 3.4.

3.1

Unmatched ray-driven/voxel-driven (RD/VD) pair

3.1.1 Ray-driven projector

For each projection angle φ and each cell♣ue, veq of the detector, the ray-driven (RD) projector traces a ray connecting the source and the center of the cell, as illustrated in 2D in figure 3.1. In the field-of-view, this ray is regularly sampled with step δ, which is the side length of the voxels. At each sample point♣xk, yk, zkq, the value f♣xk, yk, zkq is calculated by trilinear interpolation. According to the discretization of the integral in Beer-Lambert law

g♣ue, ve, φq ✏ ➺

L♣ue,ve,φq

f♣rqdl, (3.1)

the result of this interpolation is added to the projection, multiplied by the sampling stepsize δ : g♣ue, ve, φq ✏ δ ✂ r♣1 ✁ ǫxq♣1 ✁ ǫyq♣1 ✁ ǫzqf♣xe, ye, zeq ǫx♣1 ✁ ǫyq♣1 ✁ ǫzqf♣xe 1, ye, zeq ♣1 ✁ ǫxqǫy♣1 ✁ ǫzqf♣xe, ye 1, zeq ǫxǫy♣1 ✁ ǫzqf♣xe 1, ye 1, zeq ♣1 ✁ ǫxq♣1 ✁ ǫyqǫzf♣xe, ye, ze 1q ǫx♣1 ✁ ǫyqǫzf♣xe 1, ye, ze 1q ♣1 ✁ ǫxqǫyǫzf♣xe, ye 1, ze 1q ǫxǫyǫzf♣xe 1, ye 1, ze 1qs . (3.2)

In the trilinear interpolation,♣ǫx, ǫy, ǫzq P r0, 1s3are the normalized distances in x, y and z-directions with the nearest neighbour♣xe, ye, zeq of the sample point, as shown in figure 3.1. The RD projector is naturally highly-parallelizable on the GPU, since one ray can simply be handled by one thread. In order to accelerate the computations, the volume is copied on the texture memory of the GPU [SK10, Chapter 7]. If the volume was simply stored in the global memory of the GPU, the values of spatially neighbouring voxels would be far from each other : while f♣xe, ye, zeq and f♣xe 1, ye, zeq would be placed at neighbouring locations, f♣xe, ye, zeq and f♣xe, ye 1, zeq would be distant by Nx, and f♣xe, ye, zeq and f♣xe, ye, ze 1q by Nx✂ Ny. Since Nxand Nyare very large, performing interpolation (3.2) would result in a high memory traffic in order to read the values of each voxel. On the contrary, the texture memory stores the volume in a cache such that the accesses to spatially neighbouring voxels are made faster [NVI18, Section 3.2.11.1]. Since the memory accesses to perform interpolation (3.2) are only spatially local, using the texture memory results in a reduced memory traffic and the interpolation is done faster. The texture memory has another advantage, which is that trilinear interpolation (3.2) can be performed by the hardware of the GPU [NVI18, Section 3.2.11.1] : this results in a further acceleration of the calculations, and makes the RD projector very fast.

On the contrary, the adjoint RD backprojector is difficult to parallelize since one thread has to handle one voxel in order to avoid writing conflicts. In this case, one thread computing the backprojection in one voxel has to find the rays for which the voxel has contributed to the projections. Then, each of these rays has to be projected to compute the exact contribution of the voxel. Hence, in the RD backprojector, many threads have to compute the projections of the same rays, which is very redundant and slow. Therefore, the RD backprojector is very unefficient on the GPU. In addition, due to the interpolation step, artifacts are visible in the RD backprojection [DMB02, DMB04]. For these reasons, an unmatched backprojector can be preferred in order to accelerate the calculations [ZG00]. In this thesis, we choose the voxel-driven backprojector described in section 3.1.2.

3.1.2 Voxel-driven backprojector

For each voxel♣xe, ye, zeq, the voxel-driven (VD) backprojector traces a ray connecting the source and the center of the voxel. The projection of this ray is located at a position♣u♣φ; xe, ye, zeq, v♣φ; xe, ye, zeqq on the detector, as shown in figure 3.2 in 2D. The value of the projection at this point is calculated by a bilinear interpolation. The projections

Figure 3.1: Ray-driven projector

are copied on the texture memory of the GPU for local memory accesses. Similarly to the RD projector, the VD backprojector takes advantage of the texture memory since the bilinear interpolation is done by the hardware. The result of the interpolation is accumulated in the backprojection. This operation is repeated for each projection angle φ. The voxel-driven backprojection reads

b♣xe, ye, zeq ✏ ➳

φ

ginterp♣u♣φ; xe, ye, zeq, v♣φ; xe, ye, zeq, φq (3.3) where

ginterp♣u♣φ; xe, ye, zeq, v♣φ; xe, ye, zeq, φq ✏ ♣1 ✁ ǫuq♣1 ✁ ǫvqg♣ue, ve, φq ǫu♣1 ✁ ǫvqg♣ue 1, ve, φq ♣1 ✁ ǫuqǫvg♣ue, ve 1, φq ǫuǫvg♣ue 1, ve 1, φq.

(3.4)

In the bilinear interpolation,♣ǫu, ǫvq P r0, 1s2 are the normalized distances in u and v-directions with the nearest neighbour ♣ue, veq of the projection point ♣u♣φ; xe, ye, zeq, v♣φ; xe, ye, zeqq, as shown in figure 3.2. Like the RD projector, the VD backprojector is easily highly parallelizable and very fast, while this is not the case for its adjoint VD projector [DYXW17]. Furthermore, the adjoint VD projector suffers from the same interpolation artifacts as the RD backprojector [DMB04]. Due to these considerations in terms of image quality and computational speed, in this work, we use the VD backprojector with the unmatched RD projector described in section 3.1.1.