• No results found

The input image is from a low resolution (640×480, 0.3MP) fixed focus camera returning 8bits per pixel for each of the three colours. The lighting is not uniform across the image but the phase correlation technique is not very sensitive to this issue. For display purposes, the aspect ratio of the images has not been respected. Phase correlation can be optimised to minimise the requirement for the use of the FFT, but this is not described here.

4.3.1

General preparation

Each image loaded (figure 4.2a for example) has the edges removed as the perspective distortion can be quite significant in certain regions. This must be ascertained by the operator by examining test images (such as figure 4.2b) prior to the use of the camera for phase correlation. The image is then converted to greyscale, represented by double precision numbers in the range−0.5≤I≤0.5

(it may be computationally faster to use a different precision, but this is not the most significant concern at this stage). Although optical images used in NDT would have various other image processing operations performed, such as compensation for uneven background illumination [158], this was not found to be necessary here.

appears to provide acceptable performance, but few other windows have been tested and there may be superior alternatives for this application. The two-dimensional FFT of the image is taken and only the unique part is retained (see section 4.3.4 for details).

The magnitude of the FFT of each image is then transformed from Cartesian coordinates,

(x, y), to polar coordinates, (ρ, θ), ensuring that a suitable radial and angle increment is chosen

to retain the relevant image information. Transformations from Cartesian to polar coordin- ates are performed using bilinear interpolation of the input image. In order to retain all im- age information, the radial increment should be ∆ρ ≤ 1, and the angle increment should be ∆θ ≤ tan−1(y

max/xmax)−tan−1([ymax−1]/xmax) assuming ymax ≥ xmax, otherwise ∆θ ≤

tan−1(ymax/[xmax−1])−tan−1(ymax/xmax). This assumes that all parts of the image contain

useful information, when in practice much of the higher spatial frequency information is not useful as it is either below the noise floor or absent altogether, in which case∆θcan be made significantly

larger without detrimental loss of information. The(0, θ)component can be discarded as it does

not vary and hence will not contain information useful for the phase correlation.

The difference between pairs of images transformed will ideally be only along theθdimension,

hence the phase correlation for translational motion purely along one axis (section 4.3.3) can be used to determine the rotation,θ0.

With the rotation determined, one of the original images (as prepared in section 4.3.1 prior to windowing and FFT operations done in this section) is then rotated using bilinear interpolation, before being checked for translational movement (section 4.3.3). The translational motion check must be in two dimensions since motion along one axis will have been converted into motion along two axes after rotation. The angle output,θ0, has a 180°ambiguity, which is handled by rotating

the image spectrum by θ0, using phase correlation to obtain the translation, then repeating the

process after rotating the image spectrum byθ0+ 180◦, and using the rotation which produces the

largest correlation peak [148].

4.3.3

Phase correlation for translational motion

The images are padded along each dimension from an initial length ofpto a length ofm= 2p−1

to ensure linear correlation (figure 4.3b) later in the process. Without zero padding, the correlation is cyclic (figure 4.3a). If the motion is purely along a single dimension and there is known to be no rotation, then only that dimension requires zero padding. Usually the zero padding is rounded up such that the total size along any direction to be processed by an FFT is a power of two (m= 2n,

n∈Z+) since most FFT implementations are fastest for this size.

The reference image is then windowed for two purposes, split into two windowing operations as described here, but implemented as a single operation. If the maximum expected movement between images is known and is reasonably small relative to the size of the image, a simple rectan- gular window can be applied that removes parts of the reference image that would not be present in the test image if the maximum movement were to occur. This is combined with a Hann window applied to the remaining image, as otherwise a relatively large peak can occur in the phase correl- ation output when the edge of the reference image reaches the edge of the test image during phase correlation. The phase correlation operates acceptably without these windowing operations, but performance can be improved under certain conditions by including them.

(a) (b)

Figure 4.3: The DFT implicitly assumes the image is periodic. Consider two images, represented by boxes that are dark grey (with solid lines) and light grey (with dashed lines) respectively. Without zero padding, using the DFT implemented cross-correlation [156] would result in cyclic correlation (a), such that parts of the correlating image that leave one side would reappear on the other. By padding with zeros, linear correlation (b) is achieved, with parts of the correlating image that leave one side only encountering zeros when they reappear on the other side.

The images are then passed through the FFT, which converts pixel position to spatial frequency; this is usually done along both dimensions, but if the motion is purely along a single dimension and there is known to be no rotation, then only that dimension is operated along (this can result in a considerable computational saving). Only the unique part of the FFT is retained (section 4.3.4) before the complex conjugate is taken of the test image FFT and it is multiplied by the reference image FFT.

Phase correlation whitens the signals by normalisation (figure 4.4), which makes it robust to uniform variations in illumination, offsets in average intensity, and fixed gain errors [159]. However, AWGN extends across all spatial frequencies, and since the use of just the phase information tends to enhance high frequency components, the phase-correlation technique is sensitive to such noise [160]. The reason for this is that a typical image will usually have larger amplitude low frequency components than high frequency components. AWGN has equal components across the frequency spectrum, and hence will have a relatively larger effect upon the smaller high frequency components of the image. Since the phase-correlation technique gives equal weight to all frequency components, the smaller high frequency components will be enhanced to the level of the larger low frequency components, and if noise has affected these components, the result will be a distortion of the correlation output and potentially an error in distinguishing the true correlation peak from false peaks due to noise. Since the image (without noise) is always bandlimited, considering phase components above the bandlimit of the image (the maximum spatial frequency at which the image changes) will amplify the impact of random noise. It is reasonable to ignore frequencies outside the bandwidth of the image when constructing the correlation output [160], although this can unnecessarily widen and shorten the correlation peak if actual image frequencies are filtered away. Assuming the user has previously examined similar images, components at spatial frequencies which contain signal information of magnitude less than twice as large as the noise at that spatial frequency, can be set to zero to minimise the effect of AWGN on the correlation accuracy. The DC and Nyquist spatial frequencies are always removed as they contain no phase information, since such information at these frequencies manifests as a change in magnitude, not a change in phase. A simple rectangular filter is sufficient since the next step is to remove the magnitude of the cross- correlation by dividing any element with a non-zero magnitude by its magnitude, leaving only the

(a) x Spatial Frequency (pixels−1)

y Spatial Frequency (pixels

0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 Angle (degrees) 0 50 100 150 200 250 (b) x (pixels) y (pixels) 0 20 40 60 80 100 120 0 20 40 60 80 Magnitude (arbitrary) 0 0.2 0.4 0.6

Figure 4.4: The magnitude of the DFT of a single pixel in an image is flat. The phase of the DFT (a) is not flat and is linked to the position of the pixel, here at(4,2)in a 128×128 image. Consider

a rectangle of size 32×16 and bottom left corner at (31,16) in an 128×128 image. By taking

the DFT, normalising such that every frequency component has the same magnitude (whitening the spectrum), and then taking the IDFT, only the phase of the image remains (b). The low frequency components (slowly varying parts of the image) had the largest magnitude, so this process relatively emphasises the higher frequencies (rapidly varying parts of the image). Sharp edges and particularly the corners of the rectangle are emphasised at the expense of the uniform parts of the image. A side effect is that the rectangle gains what are analogous to harmonics. phase information remaining. An IFFT is then performed to give the spatial phase correlation. The peak can be identified by simply searching for the maximum. It is trivial to convert the peak position to a shift. If a possible range of shifts is known, it is sensible to restrict the search for a maximum peak to this range. The size of the correlation peak can be used to determine the quality of the match.

There are various ways to extend phase correlation for sub-pixel precision [159, 161], but in this case the conventional FFT up-sampling approach is used. For usage on modern computers and small input images, or if only minimal additional resolution is required above pixel resolution, this method is acceptable. All that is required is zero padding the phase correlation in the spatial frequency domain, with each doubling in size, which must be performed along each dimension that was operated on by the FFT, resulting in a doubling of the spatial resolution. Changing the size by doubling it, rather than some other arbitrary multiple, is done to retain the size as a power of two.

For cases where the computational (speed and memory) burden of the conventional FFT up- sampling approach is too great, an alternative method may be necessary, such as the two-step DFT approach [161] using a matrix multiplication implementation of the 2D DFT [162]. There is a significant speed advantage to using the matrix DFT to evaluate the final stage of the phase correlation in an area very local to the expected correlation peak [161]. In the general case, an IFFT without additional zero padding must first be performed to identify the approximate position of the correlation peak, which is still a much faster case than evaluation of the padded IFFT, with significantly less memory required. However, if the shift is known to be limited to a very small range, it could be computationally more efficient to use the matrix DFT to evaluate just this range without first performing an IFFT. In addition, further speed increases can be realised, since the spatial frequency filtering can be incorporated by simply not evaluating terms which would be zero anyway due to the filtering operation. Finally, the DFT matrix approach does not require

2n samples to run at its fastest (in its standard form), and the outputs do not need to be evenly

(a) 8 9 10 11 12 13 14 0.05 0.1 0.15 0.2 0.25 Size (n) Time ( [FFT 2 n +16DFT 2 n ]/FFT 2 n+4 ) (b) 4 5 6 7 8 9 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Size (n) Time ( [FFT2 2 n +16DFT 2 n ]/FFT2 2 n+2 )

Figure 4.5: Consider a complex 1D FFT output of 2n samples for which the peak position of

the IFFT must be found to ×16 the resolution. This can be interpolated using the standard

FFT technique of padding to 2n+4 samples before applying the IFFT. Alternatively, the IFFT

can be calculated for the 2n case and then an (inverse) DFT matrix can be applied at the peak

to interpolate just that area. The latter technique is relatively much faster (a). Similar speed increases can be seen for the 2D case, even for only ×4 the resolution (b), and the DFT matrix

requires significantly less memory. If the original 2D input requires 22n memory elements, after

padding for the standard FFT technique it would require22(n+4)memory elements, a substantial

increase.

being of more significance here.

Figure 4.5 depicts the simulated speed increases for 1D (a) and 2D (b) operation; these simu- lations assume a sample size of2n, so other sample sizes would favour the matrix DFT approach

to a greater extent than shown. Values given are for the time taken for a standard size FFT plus a DFT matrix operation, divided by the time taken for a padded size FFT. Values less than unity indicate a speed benefit. The simulations took place on a desktop machine running MATLAB R2007b (32bit edition) on Windows 7 (64bit edition). The PC had an Intel Core2 CPU, model 6420, at 2.13GHz (only using a single core to run the simulations due to limitations of this ver- sion of MATLAB), 2GB of RAM (all simulation data fit within this, but it limited the possible size for 2D testing), and no other CPU intensive applications running concurrently. The results are averaged over multiple runs, and deviations were small. The DFT matrix is very simple to implement.

4.3.4

Unique part of the FFT

In the 1D case the unique part of the FFT is the firstm/2 elements; them/2 + 1element is the

Nyquist frequency, and is unique, but is discarded during the phase correlation as it contains no phase information. The 2D case is unique for the firstm/2elements along the first FFT dimension

and all the elements along the second FFT dimension since the second FFT dimension has the FFT operating on complex data rather than the real data of the first dimension. Only retaining unique parts means that computationally expensive operations do not need to be replicated for duplicate data. The IFFT must treat the data as if it were symmetric and of its size prior to discarding information that was not unique, and many FFT implementations have this feature to deal with cases when the data is not exactly conjugate symmetric due to round-off error, a feature that can be abused here to save significant computation.

(a)

ϕ

Δy

Δx

Δr

θ

(b)

ϕ-θ

Δy'

Δx'

Δr

Figure 4.6: The reference image has been rotated anticlockwise by θ and translated by(∆x,∆y)

to form the test image (a), the angle between the image centres isφ, and the correct rotation and

translation is reported by the phase correlation. If the reference and test images were switched, after the phase correlation rotates the new reference image (b), the angle between the centres is

φ−θand the translation is reported as(∆x0,∆y0). In practical usage, both cases are equally valid

and indistinguishable.

4.3.5

Rotation and translation operation order

When producing test images to check the performance of the phase correlation algorithm, the order by which the reference image was rotated and translated to form the test image can alter the reported translation. Starting with a reference image that is at a rotation of zero degrees and a translation of zero pixels (origin O), the phase correlation technique, as previously described,

first adjusts for a rotation by rotating the reference image, and then detects the translation. If the reference image was first rotated and then translated, the reported translation of the test image will match the actual translation performed. If the reference image was translated and then rotated, the reported translation will have the same Pythagorean magnitude,∆r, but the components will

differ, since the translation components are also rotated. Consider this second case, and assume a rotation angle of θ and a translation of (∆x,∆y). Initially the transformed image centre is at

an angle ofφwith respect to the reference image centre (figure 4.6a). After the phase correlation

removes the rotation ofθ, the transformed image centre is at an angle ofφ−θ (figure 4.6b). The

rotated translation components are then(∆x0,∆y0).

∆r=p(∆x)2+ (∆y)2 (4.14)

φ= tan−1(∆y/∆x) (4.15) ∆x0 = ∆rcos(φ−θ) (4.16) ∆y0= ∆rsin(φ−θ) (4.17)