3.6 Frequency Selectivity and Multifocus Fusion
3.6.2 Experiments and Discussions
The proposed filter pair with sharper transition band and better frequency selectiv- ity is compared to other traditional focus measure systems in the literature, namely Tenengrade (Tenen), gray level variance (GLV) [20] and sum of modified Laplacian (SML) [63], and to the Daubechies orthogonal filter bank (db4), Cohen-Daubechies- Feauveau (CDF) 9/7 biorthogonal filter bank used in JPEG2000, Mexican hat (Mexh), Morlet, and Meyer. Several available multifocus image sequence datasets, e.g., Sim- ulated Cone {72 frames, 318×318}, Chess {29, 800×600}, Lab {6, 512×512}, Clock
Ripple 0.00 0.02 0.04 0.06 0.08 0.10 SN R 30 31 32 33 m>5 m>2 (a) N = 7, m = 4, 2 Ripple 0.00 0.02 0.04 0.06 0.08 0.10 SNR 30 31 32 33 m>5 m>3 m>1 (b) N = 9, m = 5, 3, 1
Figure 3.11 – Trade-off between the SNR and ripples.
{2, 512×512}, FlyEye {37, 325×217}, and Rifle {24, 200×150}, have been used in experiments. Fig. 3.12, (a1–a2) and (b1–b2) show some randomly selected sample frames at different focus from the Simulated Cone and Rifle datasets, respectively. It is followed by the corresponding depth map, depth map under Gaussian noise with zero mean and variance of 0.001, fused image, 3D shape reconstruction employing the depth map under noisy condition, in Figs. 3.13 and 3.14, respectively. For the depth maps, shown in Figs. 3.13 and 3.14 (a1)–(a9) and (b1)–(b9), the contour plane axes denote the image size and the vertical axis is obviously the number of frames in each dataset, indicating which pixel comes from which image. Also, the fused im- ages are given in Figs. 3.13 and 3.14 (c1) to (c9) for each example set, respectively. The proposed filters with better frequency selectivity provide promising results when compared to other approaches as well as the crude wavelets family. For the crude wavelets, the worst results belong to the Mexican hat. Using the fused image and the depth map, it is straightforward to extract the 3D shape as is shown in Figs. 3.13 and 3.14 (d1) to (d9) for the Simulated Cone and Rifle datasets, respectively.
To assess the fusion performance statistical measurements such as signal-to-noise ratio (SNR), peak signal-to-noise ratio (PSNR), and mean square error (MSE) may be used if the reference image, fully-focused image in all depths, is available. How-
(a1) (a2) (b1) (b2)
Figure 3.12 – (a1) and (a2) are randomly selected samples from the Simulated Cone dataset. (b1) and (b2) show some sample images from the Rifle dataset. Note to the position of the focus plane in each image.
ever, in practice it can be achieved for simulated and synthetic images such as the Simulated Cone dataset, as the reference image is rarely known for real images. Re- cent image fusion assessment methods can evaluate the fusion technique without any reference image. These methods assess the fusion on input-output relationship. In [109] a mutual information (MI) based principle has been used to evaluate the fusion technique. MI calculates the quantity of information transferred from source images (input) to a fused image (output). Xydeas and Petrovic [104] proposed a fusion as- sessing technique based on pixel level (Qp) analysis, in which visual information or perceptual information is directly associated with the edge information while region information is ignored. Among several available quality assessment methods struc- tural similarity (SSIM) index [101] has been widely used in imaging applications in literature when the reference images are not available. SSIM image quality index in fact measures three elements of image patches; the similarity of local brightness, contrast, and structures. The SSIM comparison without noise and in presence of a Gaussian noise are given in Tables 3.7 and 3.8, confirming the experimental results and visual improvements obtained. In these tables, db4, pro, and 9/7 stand for the Daubechies, the proposed, and the CDF 9/7 filter banks. It is seen that for the pro- posed wavelet based technique the results are independent of the images and datasets, in contrast, the results obtained by other approaches may be affected by the nature of images in each dataset, i.e., a uniform priority cannot be assigned. It can be seen in Tables 3.7 and 3.8 for Tenen, GLV, and SML.
Depth Map
Depth Map under Gaussian Noise
Fused Image 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72
(a1) Mexh (a2) Morlet (a3) Meyer (a4) Tenen (a5) GLV
(a6) SML 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72
(a7) DWTdb4 (a8) DWTpro
(a9) DWT 9/7 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72 0 20 40 60 100 200 300 100 200 300 72
3D Reconstruction via Depth Map
(b1) Mexh (b2) Morlet (b3) Meyer (b4) Tenen (b5) GLV
(b6) SML (b7) DWT (b8) DWTdb4 pro (b9) DWT9/7 (c7) DWTdb4 (c8) DWT pro (c9) DWT9/7 (d7) DWTdb4 (d8) DWTpro (d9) DWT 9/7
(c1) Mexh (c2) Morlet (c3) Meyer (c4) Tenen (c5) GLV
(c6) SML
(d1) Mexh (d2) Morlet (d3) Meyer (d4) Tenen (d5) GLV
(d6) SML
——————————————————————————————————————
——————————————————————————————————————
——————————————————————————————————————
Figure 3.13 – Cone: (a1–a9) depth map without noise; (b1–b9) depth map under Gaussian noise; (c1–c9) fused image; (d1–d9) reconstructed shape using fused image and depth map under Gaussian noise.
Depth Map
Depth Map under Gaussian Noise
Fused Image
(a1) Mexh (a2) Morlet (a3) Meyer (a4) Tenen (a5) GLV
(a6) SML (a7) DWTdb4 (a8) DWTpro (a9) DWT9/7
3D Reconstruction via Depth Map
(b1) Mexh (b2) Morlet (b3) Meyer (b4) Tenen (b5) GLV
(b6)SML (b7)DWT db4 (b8) DWTpro (b9) DWT9/7 (c7)DWTdb4 (c8) DWTpro (c9) DWT9/7 9/7
(c1)Mexh (c2) Morlet (c3)Meyer (c4) Tenen (c5) GLV
(c6)SML
(d1) Mexh (d2) Morlet (d3) Meyer (d4) Tenen (d5) GLV
50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 50 100 150 200 40 80 120 24 12 ( ) DWTdb4 (d8) DWTpro ( 9) DWT9/7 ( 6) SML d d7 d ———————————————————————————————————— ———————————————————————————————————— ————————————————————————————————————
Figure 3.14 – Rifle. (a1–a9) depth map without noise; (b1–b9) depth map under Gaussian noise; (c1–c9) fused image; (d1–d9) reconstructed shape using fused image and depth map under Gaussian noise.
Table 3.7 – SSIM for different datasets using various methods (No noise).
Type Sim.Cone Chess Lab Clock FlyEye Rifle
Mexh 0.2216 0.6602 0.8961 0.8473 0.5753 0.4654 Morlet 0.2268 0.6643 0.9015 0.8505 0.5811 0.4723 Meyer 0.2391 0.7278 0.9254 0.8823 0.6153 0.4965 Tenen 0.2432 0.6757 0.9212 0.8702 0.5929 0.4875 GLV 0.2391 0.7111 0.9193 0.8617 0.6041 0.4997 SML 0.2400 0.7082 0.9240 0.8797 0.6004 0.4884 DWTdb4 0.2468 0.7215 0.9253 0.8835 0.6091 0.4941 DWTpro[4] 0.2498 0.7749 0.9339 0.8812 0.6112 0.5170 DWT9/7 0.2511 0.7765 0.9369 0.8821 0.6148 0.5204
Table 3.8 – SSIM for different datasets using various methods (σ = 0.001).
Type Sim.Cone Chess Lab Clock FlyEye Rifle
Mexh 0.1998 0.2912 0.7203 0.6572 0.3004 0.2959 Morlet 0.2033 0.2992 0.7281 0.6624 0.3076 0.3016 Meyer 0.2195 0.3341 0.7315 0.6728 0.3203 0.3323 Tenen 0.2335 0.3104 0.7386 0.6815 0.3173 0.3271 GLV 0.2142 0.3156 0.7373 0.6707 0.3253 0.3153 SML 0.2319 0.3083 0.7401 0.6802 0.3214 0.3128 DWTdb4 0.2212 0.3382 0.7282 0.6721 0.3157 0.3305 DWTpro[4] 0.2349 0.3447 0.7429 0.6829 0.3309 0.3363 DWT9/7 0.2387 0.3491 0.7493 0.6882 0.3376 0.3398
3.7
Conclusions and Summary
In this chapter, we have analyzed and investigated the effect of the frequency selec- tivity of filters associated with a multiresolution transformation on the performance and accuracy of a face recognition system. In order for this analysis being carried out, we first propose a general method to the design of the wavelets used in biorthogonal filter banks. The method structurally incorporates the desired number of moments and perfect reconstruction. The proposed formulation is, in general, a systematic
representation of a parametric polynomial. The rationale of the idea is to prove and show that the filter coefficients in the polynomial domain can be written in terms of the coefficients of the corresponding function in the frequency domain in general. The proposed technique offers tuning opportunity on the passband and stopband width and ripples, that is, we can incorporate free parameters to control the transition band, amplitude of ripples and number of moments. Depending on an application and the required trade-off between the filter pair characteristics, one can select different num- ber of free parameters and tuning terms to have several alternatives to control the desired specifications. Based on the properties of the proposed method, and noting that the traditional maximally flat wavelet filters have poor frequency selectivity due to their wide transition band, we can then analyze the effect of the frequency response characteristics of the filters in a multiresolution transformation on the performance and accuracy of a face recognition system.
To this end, a multiresolution based face recognition system can be developed using the filter pairs introduced and designed in this chapter. In other words, the filter pairs of a typical discrete wavelet transform is replaced by our proposed tunable biorthogonal filters. The face images in databases are then decomposed to frequency subbands and high-frequency subbands are thresholded. The reconstruction is per- formed employing the customized tunable filter pairs. Therefore, the effect of the frequency selectivity of subbands of the multiresolution transformation on the per- formance of a face recognition system can be studied in details. We found that there is a relation between the sharpness of the frequency response of the filters and the recognition rate of a face recognition system. The amplitude of ripples was also found to be another factor that can influence the recognition accuracy. In general, it is con- cluded that sharper filters with possibly smaller ripples lead to higher recognition rates. It is interesting that the sharpness of the transition band of the filters are more important than the amplitude of ripples although unreasonably larger ripples degrade the results as in that case the nature of the filters, to act as a filter, may be lost.
Chapter 4
Resonance Based Image Analysis for
Illumination Suppression
4.1
Literature Review
The literature on resonance based signal decomposition and oscillatory behavior of resonance components is not that vast as the idea has been just recently introduced in [79]. Therefore, it is expected that, similar to widely used transformations such as wavelets, curvelets, and Fourier transform, resonance based design and analysis will grow up rapidly in the near future. Basically, the representation of signals via tra- ditional frequency based approaches like Fourier transform have been used for many years, where the respective techniques are mostly appropriate for signals with finite duration. In reality, the signals are not necessarily well expressed based on frequency components as they may consist of oscillatory components that cannot be elaborated using the traditional frequency based approaches. In [79], Selesnick shows that pro- cessing and study of non-stationary signals and pulses with oscillations can be well described based on a new nonlinear signal decomposition method in view of the con- cept of resonance rather than the frequency. The method in [79] describes how a signal can be considered as a combination of high-resonance and low-resonance components,
where the terms high- and low-resonance components refer to a signal with sustained and transient oscillations, respectively. Examples of such signals are widely found in medical systems such as electroencephalography, and in speech and voice processing field. In [81], it has been shown that using the rational-dilation wavelet transform (RDWT) and proposes that the high-resonance component can represented by a high Q-factor wavelet transform. Similarly, and as expected, the low-resonance component can be represented via a low Q-factor wavelet transforms. In order to simultaneously decompose a signal into its components, morphological component analysis (MCA) has been shown to be effective to separate the two components. In addition, it- erated soft-thresholding approach and split augmented Lagrangian shrinkage have been investigated for the same purpose [81]. Resonance based signal analysis has been studied from a different point of view in [83] in which rational-dilation wavelet transform is used for the sparse representation of resonance components. It has been also shown that the split augmented Lagrangian shrinkage algorithm can be employed instead of MCA to speed up the optimization cycle of the procedure for the problem investigated in [83].
As mentions earlier, the resonance based signal representation is a very new topic, and therefore, the literature and research based on this concept is relatively seldom. In this chapter we show that by transferring a face image into signal domain and employing the resonance based analysis the oscillatory and transient components, the so-called high- and low-resonance components, promising results are achieved for the illumination invariant analysis of face images to the design of an efficient illumination invariant human face recognition system. The idea of designing a resonance based face recognition system in our research is motivated by the possibility of analyzing the nature of unwanted illumination effects in terms of resonance components. In fact, necessity of such investigation is twofold; the recognition accuracy can be further improved, and more importantly the number of tuning parameters can be reduced to lower the computational complexity.
Although illumination is mostly considered as the low-frequency part of images, these low-frequency contents may possess low- and/or high-resonance nature. In this chapter we first assume that an input image can be considered as a combination of illumination and reflectance. We then decompose the images into low- and high- resonance components simultaneously. Because the energy distributions of subbands via resonance decomposition are different for an image with good illumination effects and an image with high illumination variations, the energy of subbands of the two components can be thresholded to deactivate the subbands with unwanted energy distribution created by illumination effects. For dimensionality reduction and clas- sification the principal component analysis and the extreme learning machine have been used, respectively. Experiments and comparisons illustrate the effectiveness of the proposed resonance-based method in illumination invariant face recognition.
4.2
Motivation and Problem Statement
In Chapter 2 we proposed a new method for the problem of illumination invariant hu- man face recognition. The approach is essentially based on the frequency component analysis for which frequency subbands of a discriminator multiresolution transforma- tion the so-called double-density dual-tree complex wavelet transform are taken into account. In Chapter 3, we go deeper and investigate how the frequency selectivity of a transformation may influence the performance of a recognition system. Accordingly a general solution was proposed to the design of tunable biorthogonal filter pairs for which we can control and tune the desired characteristics. We then analyzed how the recognition rate can be affected via these characteristics. While the approaches in Chapter 2 and Chapter 3 are actually based on the density of wavelets and the fre- quency selectivity of the wavelet subbands, they are in fact expressed in terms of the resolution of frequency subbands. In other words, low- and high-frequency compo- nents and overall frequency information is used to design and analyze the recognition
system. In this chapter concept of resonance is admitted to the problem of illumi- nation invariant face recognition [7]. Although frequency based analysis is perhaps the most widely used approach in signal processing, it is indeed suitable for signals that are periodic and possess an oscillatory behavior. Nevertheless, many applica- tions in reality, such as speech, biomedical, and communication signals, as well as physiological phenomena like human vision system, are a mixture of frequency and resonance. The nature of such signals makes it difficult to be studied by linear and frequency based methods. These non-stationary, nonlinear, and oscillatory signals are hard to be analyzed by linear, stationary, and frequency based approaches. In most of the cases, solutions to these problems are a linear approximation in the frequency domain. Recently it has been shown that nonlinear signal analysis based on signal resonance can open doors to investigate and answer to the weakness of the Fourier and wavelet transforms to some extent. In fact, resonance based analysis represents a signal as the sum of high-resonance and low-resonance components rather than traditional high-frequency and low-frequency subbands. In the next section, first the concept of resonance is briefly reviewed in contrast to the frequency based filtering. We then, for the first time, propose a new method for the problem of illumination invariant face recognition based on the resonance components of images.