The basic concept of correlation is illustrated in Figure1.4 with the help of a simple character recognition example. In this figure, black pixels take on a value of 1 and white pixels take on a value of 0. Suppose we are trying to locate all occurrences of the reference or target image (C in this example) in the test image (also called the input scene). One way to achieve this is to cross-correlate the target image with the input scene. The target image is placed in the upper left corner of the input scene and pixel-wise multiplication is carried out between the two arrays; all of the values in the resultant product array are summed to produce one correlation output value. This process is repeated by shifting the target image by various shifts to the right and down, thus producing a two-dimensional (2-D) output array called the correlation output. Ideally, this correlation output would have two large values corresponding to the two ‘‘C’’ letters in the input scene and zeros for other letters. Thus, large cross-correlation values indicate the presence and location of the character we are looking for. However, this will not always be achievable because some other letters may have high cross-correlation. For example, letter ‘‘C’’ and letter ‘‘O’’ have large cross-correlation. One of the goals of this book is to develop methods that preserve large cross-correlation with desired targets, while suppressing cross-correlation with undesired images (sometimes called the clutter), and reducing sensitivity to noise and distortions such as rotations, scale changes, etc. The goal of this chapter is to provide the basic ideas underlying the use of correlation as a pattern recognition tool.
Correlation can be thought of as the output from a matched filter (a linear, shift-invariant (LSI) filter whose impulse response is the reflected version of the reference signal or image), and it can be shown to be ‘‘optimal’’ for detecting known signals corrupted by additive white noise. We will start this chapter by establishing this notion of optimality of correlation. Correlation can be implemented using either optical or digital processing, and this chapter will also discuss some basic correlator implementation methods.
A major milestone in the development of correlation for pattern recognition was the pioneering work by VanderLugt [5] who represented complex-valued matched filters using holograms and thus implemented correlation operation using coherent optical processors [32]. In particular, that work made possible the use of optical correlators to detect and locate reference images in observed scenes. While optical correlators will be considered in more detail in Chapter8, we will introduce the VanderLugt correlator briefly in this chapter mainly to motivate the development of the many variants of the classical matched filter. We will also consider digital methods for computing the correlations. While digital implementation of correlations using discrete Fourier transforms (DFTs) appears rather straightforward, some important issues arise. (The fast Fourier transform (FFT) is literally just an efficient algorithm for comput- ing the DFT, but we will often use FFT and DFT synonymously.) Firstly the use of DFTs results in circular rather than linear correlations and care must be taken to make the circular correlation match the desired linear correlation. Another issue is that if a direct (i.e., in time or space domain) digital correl- ation must be carried out, it is more hardware-efficient (hence faster) if we can reduce the number of bits used to represent the signals or images. We will consider the consequences of using binarized (2 levels, or 1 bit per pixel) or other quantized images (e.g., 4 levels, or 2 bits per pixel) for correlation.
We will use 1-D CT signal notation throughout making only occasional use of 2-D notation. All results presented using 1-D notation have obvious exten- sions to 2-D unless specifically indicated otherwise.
Section 5.1 introduces the notion of a matched filter and shows how the matched filter maximizes the output signal-to-noise ratio (SNR). Implementations of correlators are discussed in Section5.2and performance metrics to evaluate correlation outputs are presented in Section 5.3. Generalizations of the matched filter are discussed in Section 5.4, and Section5.5 presents our model for the optical correlation process, including how noise affects the statistics of the measured correlation. This section also unifies several of the filter-optimizing algorithms under the minimum Euclidean distance optimal filter (MEDOF) scheme. Finally, Section 5.6
deals with non-overlapping noise which arises when the object obscures the background. More advanced correlation filter concepts are discussed in Chapter6.
5.1 Matched filter
The popularity of correlation methods for pattern recognition owes much to the role that matched filters play in detecting signals in received radar returns
corrupted by additive noise. This section will be devoted to reviewing the basic theory of matched filters. Consider the example where a known signal is transmitted and the received signal is examined to answer three questions:
1. Is there a target in the path of the transmitted energy? 2. If there is a target, what is its range from the transmitter? 3. If there is a target, what is its velocity?
In this section, we will focus on the first question, detecting the presence of a target. Once a target is detected, we can use the matched filter output to estimate the relative time shift between the transmitted and the received signals. Provided the speed of signal propagation is constant and known, this time delay yields the range of the target. Even without directional anten- nas, by using at least three transmitted signals whose transmission locations are known, we can estimate the range of the target to three known positions and thus triangulate the position of the target. If the target moves with a velocity component towards a receiver, it will introduce a Doppler shift in the received signal thus causing a frequency shift. We can estimate this frequency shift and hence the target velocity.
5.1.1 Known signal in additive noise
Let s(t) denote the transmitted signal and r(t) denote the received signal containing effects such as attenuation, time delay, Doppler shift, and noise. In this simple model, we will consider the effects of additive noise only. As mentioned earlier, time delay and frequency shifts can be estimated from the received signal. Attenuation causes a decrease in the SNR, which degrades the detection performance. However, attenuation does not change the optimality of the maximal SNR filter we derive in this section.
For this additive noise model, the detection problem simplifies to that of choosing between the following two hypotheses:
H0: r tð Þ ¼ n tð Þ H1: r tð Þ ¼ s tð Þ þ n tð Þ
(5:1)
where n(t) denotes the noise. This noise is modeled as a wide sense stationary (WSS) random process with zero mean and power spectral density (PSD) Pn( f ). Note that we have not yet assumed anything about the noise prob-
ability density function (PDF). Our task is to select between the two hypoth- eses using r(t) and our knowledge of s(t) and Pn( f ).
5.1.2 Maximal SNR filter
The basic approach used for this binary signal detection problem is the linear filter paradigm shown in Figure5.1. The received signal r(t) is passed through an LSI system with impulse response h(t) (or equivalently, frequency response H( f ), the FT of h(t)). The output signal y(t) is searched for its maximal value ymax and this maximum is compared to a pre-selected threshold T. If ymax
equals or exceeds T, then the received signal is declared to contain the trans- mitted signal (i.e., H1 is selected), whereas if ymax is less than T, then the
received signal is declared to contain only noise (i.e., H0 is selected). In fact,
the position of this maximal value yields the relative time shift between the transmitted and received signals and thus the target range. If the threshold T is low, then the probability of a miss will be small (few H1cases will be missed),
but the probability of a false alarm (case of H0being mis-detected as H1) will be
large. If T is large, the converse will occur.
Signal-to-noise ratio In this approach, the most important step is the design of the filter H( f ). A good filter should make the average ymax large (under
hypothesis H1) and make the average noise-induced variance as small as
possible. Thus, it is desirable that the filter H( f ) maximize the SNR defined as follows:
SNR¼jE yf maxjH1gj 2 var yf maxg
(5:2) where E{} denotes expectation and ‘‘var’’ denotes the variance. Since the variance arises strictly as a result of noise in this additive noise model, and since the noise process has the same characteristics under both hypotheses, output noise variance is the same for both hypotheses.
Optical and digital correlators have different expressions for SNR, owing to the processors’ different properties. The digital processor’s output can be exactly linear with the input; however, the optical processor’s cannot. The sensed output in optical correlators is the electromagnetic intensity – the squared magnitude of the field. Different expressions for SNR result. Having paid this polite nod, we will concentrate on the strictly linear
LTI filter H(f) Input Sample at peak r(t)=s(t)+n(t) Compare to threshold Decision
form in this chapter and clarify the optical situation in Section 5.5 and Chapter8.
Since the mean of the noise is assumed to be zero, E{ymax|H1} is the maximal
value of the filter output when s(t) is the input signal. For the purposes of determining the optimal H( f ), we can assume without loss of generality that the output y(t) has its maximal value at the origin. If a given filter’s output peak happens to occur somewhere else, it can be brought to the origin by simply multiplying H( f ) by an appropriate linear-phase function in frequency. This multiplication of H( f ) by a phase function will not affect the filter’s noise response since the output noise PSD and the variance it induces are affected only by the magnitude of the filter frequency response, and not its phase. Thus, the numerator of Eq. (5.2) can be simplified as follows:
E yf maxjH1g j j2¼ E y 0j f ð Þ Hj 1gj2 ¼ Z s tð Þh tð Þdt 2 ¼ Z S fð ÞH fð Þdf 2 (5:3)
where we assume that the signal s(t) and the impulse response h(t) are real. To express the denominator of Eq. (5.2) similarly in terms of known quantities, we note that the variance depends only on the noise and is thus independent of the signal s(t). Since the input noise n(t) is WSS with PSD Pn( f ), the output noise
from this LSI system is also WSS with PSD Pn( f )|H( f )|2. Since the variance
of a zero-mean random process equals the total area under its PSD, we can express the denominator of Eq. (5.2) as follows:
var yf maxg ¼ Z
Pnð Þ H ff j ð Þj2df (5:4)
Using Eqs. (5.3) and (5.4) in Eq. (5.2), we obtain the following expression for SNR in terms of S( f ), the FT of the known transmitted signal s(t), Pn( f ), the
PSD of the additive input noise n(t) and H( f ), filter frequency response: SNR¼ R S fð ÞH fð Þdf 2 R Pnð Þ H ff j ð Þj2df (5:5) Before determining the H( f ) that maximizes the SNR, a few remarks based on Eq. (5.5) are in order. Multiplying the filter H( f ) by a complex constant does not affect the SNR since it scales both the numerator and the denomi- nator of Eq. (5.5) identically. Also, the phase of the filter affects the numerator of this SNR expression, but not its denominator. Finally, if there are any frequency regions where the noise PSD is zero and the signal FT is not zero, we can hypothetically achieve infinite SNR simply by setting the filter magnitude
to be non-zero in only those frequency regions. In practice, this does not occur and we need a more realistic filter to achieve high SNR.
Signal-to-noise ratio maximization Our goal is to find the filter H( f ) that max- imizes the SNR in Eq. (5.5). To obtain this filter in the digital instance, we use the Cauchy–Schwarz inequality (discussed in Chapter 2) rewritten below in terms of two arbitrary functions A( f ) and B( f ).
Z A fð ÞB fð Þdf 2 Z A fð Þ j j2df Z B fð Þ j j2df (5:6) with equality if and only if A( f )¼ B( f ), where is a complex constant. We can apply Eq. (5.6) to the numerator of Eq. (5.5) to obtain the following upper bound on the SNR: SNR¼ R S fð ÞH fð Þdf 2 R Pnð Þ H ff j ð Þj2df ¼ Z S fð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pnð Þf p " # H fð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnð Þf p h i df 2 R Pnð Þ H ff j ð Þj2df Z jS fð Þj2 Pnð Þf df " # RPnð Þ H ff j ð Þj2df h i R Pnð Þ H ff j ð Þj2df ¼ Z S fð Þ j j2 Pnð Þf df ¼ SNRmax (5:7) where we are allowed to take the square root of Pn( f ) since it is real and non-
negative. Equation (5.7) shows that the SNR achievable with any filter must be less than or equal to SNRmax, which depends only on S( f ), the signal FT and
Pn( f ), the noise PSD, and not on the filter H( f ). The Cauchy–Schwarz
inequality in Eq. (5.6) also tells us when the equality holds. Using the equality condition, we see that the maximal SNR is obtained if and only if
S fð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pnð Þf p " # ¼ H fð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnð Þf p h i ) H fð Þ ¼ S ð Þf Pnð Þf (5:8) where is any complex constant. Thus the filter in Eq. (5.8) is optimal in the sense that it maximizes the SNR.
We will call a filter phase-canceling if its phase and that of S( f ), the signal FT, sum to a constant. Thus, the maximal-SNR filter is phase-canceling. It is satisfying to see that the optimal filter has a frequency response magnitude that is proportional to the ratio of the signal FT magnitude to the noise PSD.
At those frequencies where the known signal is weak compared to the noise, the optimal filter has low gain and thus the received signal is attenuated. At those frequencies where the signal is strong compared to the noise, the filter gain is high and the received signal is amplified. From Eq. (5.7), we see that the maximal SNR can be increased by amplifying the signal and/or reducing the noise level.
Signal-to-noise ratio maximization using vectors We used the Cauchy–Schwarz inequality to derive the maximal-SNR filter in the CT domain. We will show that the same result can be obtained in the DT domain using matrix/vector results presented in Chapter2.
Towards that end, let us denote the sampled version of the desired filter by column vector h; i.e., h¼ [H( Nf ) . . . H(0) . . . H(Nf )]Twhere we sample the CT filter frequency response H( f ) at uniform intervals of f and where we truncate the discretized frequency response to (2Nþ 1) samples centered at zero frequency. Similarly s is a column vector whose (2Nþ 1) elements are the samples of S( f ), the signal FT. The noise PSD Pn( f ) is sampled at uniform intervals of
f, and the resulting (2Nþ 1) samples are placed along the diagonal of a (2N þ 1) by (2Nþ 1) diagonal matrix P. Assuming that the sampling interval f is sufficiently small, the SNR in Eq. (5.5) can be approximated as follows:
SNR¼ R S fð ÞH fð Þdf 2 R Pnð Þ H ff j ð Þj2df ffi f PN k¼NS kfð ÞH kfð Þ 2 f PNk¼NPnðkfÞ H kfj ð Þj2 ¼ f s Th 2 hþPh ¼ f hþssTh hþPh (5:9)
where superscriptþ denotes the conjugate transpose. Once again, multiplying vector h by a complex scalar does not affect the SNR. To find the filter vector h that maximizes the SNR, we set the gradient of SNR with respect to h to zero as follows. (A similar gradient technique will be developed for optimal optical filters.) rhþðSNRÞ ¼ f rhþ hþssTh hþPh ¼ f h þPh ð Þ s sTh h þssThPh hþPh ð Þ2 ¼ 0 ) Ph ¼ h þPh ð Þ s Th hþssTh ð Þ s ) h ¼ P1s (5:10)
where ¼ h þPh ð Þ s Th hþssTh ð Þ
We see that h¼ P1syields the maximal SNR, and in Eq. (5.9) we further observe that does not affect the SNR, so in fact is arbitrary. Since P is a diagonal matrix, this is the same as the sampled version of the maximal SNR filter derived in Eq. (5.8).
5.1.3 White noise case
An important special case occurs when the input noise is white noise. The PSD for white noise is a constant; i.e., Pn( f )¼ N0/2, where the denominator 2 is
included to indicate that the PSD is a two-sided spectrum. For this special case, the maximal-SNR filter and the resulting maximal SNR simplify as follows:
H fð Þ ¼ Sð Þf (5:11) and SNRmax¼ Z jS fð Þj2 N0=2 df ¼ R s tð Þ j j2dt N0=2 ¼ Es N0=2 where Esdenotes the energy in the transmitted signal.
Matched filter The maximal-SNR filter in Eq. (5.11) is known as the matched filter(MF) since H( f ) is proportional to S*( f ) or equivalently h(t) is propor- tional to s(t). Thus, for the white noise case, the filter that maximizes the output SNR has an impulse response that is the reflected version of the transmitted signal. For time-domain signals, this time reversal may appear to be impractical in that h(t) may be non-zero for negative arguments and thus the filter may be non-causal and thus unrealizable. If the signal s(t) is of finite length or can be approximated as of some finite length (as all practical signals can be), one can overcome the non-causality problem by using h(t)¼ s(t T) where T represents a sufficiently long delay. For spatial signals, such as images, the impulse response’s being non-zero for negative arguments is not an issue since the causality-type concept is not relevant for spatial systems; i.e., there is no fundamental problem in using image pixels to the left or to the right of (or above or below) the current pixels. Some sources refer to the maximal-SNR filter in Eq. (5.8) as the MF even when the noise is not white, but we will reserve the phrase ‘‘matched filter’’ strictly for the filter in Eq. (5.11).
The SNR of the MF is seen to equal Es/(N0/2) and thus we can increase the
output SNR by either increasing the energy of the transmitted signal or by decreasing the input noise level. Another important observation is that the output SNR of this matched filter is a function of only Esand (N0/2), and does
not depend on the shape of signal. Thus all transmitted signals with the same energy and same noise level result in the same matched filter output SNR, independently of their shape.
Cross-correlation When the MF is used, the output is the cross-correlation of the received signal with the known signal as shown below.
y tð Þ ¼ IFT R ff ð ÞH fð Þg ¼ IFT R ff ð ÞSð Þf g ¼ r tð Þ s tð Þ ¼
Z
r pð Þs p tð Þdp (5:12) where IFT is the inverse Fourier transform and indicates the cross-correlation operation. Thus cross-correlation provides the maximal-output SNR when the input noise is additive and white.
Suppose the received signal contains the reference signal and no noise. Then the matched filter output is the correlation of s(t) with itself, i.e., the auto- correlation of s(t). We have shown in Chapter 3 that the auto-correlation function (ACF) always peaks at the origin. If the received signal r(t) is s(t t0),
a shifted version of s(t), then the MF output peaks at t0(because of the shift-
invariance of the matched filter) allowing us to estimate the time delay between the transmitted and the received signal.
5.1.4 Colored noise
The previous section established the result that the cross-correlator is theoret- ically optimal (in the sense of maximizing the output SNR) when the input noise is additive and white. But if the noise is colored (i.e., non-white), we will show that the maximal-SNR filter can be viewed as a matched filter operating on the pre-whitened signal. The maximal-SNR filter in Eq. (5.8) can be expressed as a cascade of two filters Hpre( f ) and HMF( f ); i.e.,
H fð Þ ¼ S ð Þf Pnð Þf ¼ Hpreð Þ Hf MFð Þf (5:13) where Hpreð Þ ¼f 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pnð Þf p and
HMFð Þ ¼ f
Sð Þf ffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pnð Þf p
Figure5.2represents this cascade of two filters. When the received signal r(t) is passed through Hpre( f ), the filter output is given by y0(t)¼ s0(t)þ n0(t),
where S0ð Þ ¼ S ff ð ÞHpreð Þ ¼ S ff ð Þ=
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pnð Þf p
, and where the output noise PSD is Pn0ð Þ ¼ Pf nð Þ Hf preð Þf
2
¼ Pnð Þ=Pf nð Þ ¼ 1. Thus, thef noise coming out of the first filter is white and that is why the first filter is called the pre-whitener. The input to the second filter is the pre-whitened signal s0(t) corrupted by white noise. The second filter in the cascade is matched to s0(t) and thus HMFð Þ ¼ Sf 0ð Þ ¼ Sf ð Þ=f ffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pnð Þf p , where is an arbitrary constant. 5.2 Correlation implementation
As shown in Section 5.1.3, cross-correlation provides the maximal-output SNR when the input noise is additive and white. In this section, we will consider the implementation of this cross-correlation. Optical implement- ations will be discussed in more detail in Chapter 7. We will also use 2-D signals in this section to emphasize that the MF concept that we introduced using 1-D signals is applicable to detecting 2-D targets in images.
We mention that the quality of a correlation is considerably dependent on the set of values (called the domain) from which the filter may be drawn. Performing digital computation using the huge dynamic range of perhaps 64-bit complex filter values is, essentially, to have a continuum of complex filter values. For reasons of hardware speed, size, or electrical power draw, though, one might perform computations using fewer bits. In an optical correlator the filter domain is physically restricted to a curvilinear or discrete