The Scattering Transform - Uses of Complex Wavelets in Deep Convolutional Neural Networks

While we have introduced the scattering transform before, we clarify the format we use for this chapter’s analysis.

We use the DTCWT based ScatterNet introduced in the previous chapter –Algorithm 3.5

as a front end, withK= 6 orientations,J = 2 scales andM= 2 orders.

Consider a single-channel input signal x(u),u∈R2. The zeroth-order scatter coefficient

is the lowpass output of aJ level filter bank:

S0x(u)≜x(u)∗ϕJ(u) (4.3.1)

This is approximately invariant to translations of up to 2J _pixels1. In exchange for gaining

invariance, theS0 coefficients have lost information (contained in the rest of the frequency

space). The remaining energy ofx is contained within the first-orderwavelet coefficients:

W1x(λ1,u)≜x∗ψλ1 (4.3.2)

forλ1= (j1, θ1),j1∈ {1,2}, θ1=π+2₁₂kπ with k∈ {0,1, . . .5}.

Taking the magnitude of W1 gives us the first-order propagated signals: U1x(λ1,u)≜|x∗ψλ1|= q (x∗ψr λ1) 2+ (_x_∗_ψi λ1) 2 (4.3.3)

The first-order scattering coefficient makeU1 invariant up to the coarsest scaleJ by averaging

it:

S1x(λ1,u)≜|x∗ψλ1|∗ϕJ (4.3.4)

This has KJ = 6×2 = 12 output channels for each input channel. Later in this chapter we

will want to distinguish between the first and second scale coefficients of theS1 terms, which

we will do by moving the j index to a superscript. I.e., S1

1 andS12 refer to the set of 6 S1

terms at the first and second scales.

1_{From here on, we drop the}

4.3 The Scattering Transform | 77 The second-order scattering coefficients are defined only on paths of decreasing frequency (i.e. J2< J1) [23]

S2x(λ1, λ2,u)≜|U1x∗ψλ2|∗ϕJ (4.3.5)

Previous work shows that for natural images we get diminishing returns after m= 2. Our

output is then a stack of these 3 outputs:

Sx={S0x, S1x, S2x} (4.3.6)

with 1 + 12 + 36 = 49 channels per input channel. 4.3.1 Scattering Colour Images

A wavelet transform like the DT_CWT accepts single-channel input yet we often work on RGB images. This leaves us with a choice. We can either:

1. Apply the wavelet transform (and the subsequent scattering operations) on each channel independently. This would triple the output size to 3C.

2. Define a frequency threshold below which we keep colour information, and above which, we combine the three channels.

The second option uses the well known fact that the human eye is far less sensitive to higher spatial frequencies in colour channels than in luminance channels. This also fits in with the first layer filters seen in the well known Convolutional Neural Network, AlexNet. Roughly one half of the filters were low frequency colour ‘blobs’, while the other half were higher frequency, greyscale, oriented wavelets.

For this reason, we choose the second option for the architecture described in this chapter. We keep the 3 colour channels in our S0 coefficients but work only on greyscale for high

orders (theS0 coefficients are the lowpass bands of a J-scale wavelet transform, so we have

effectively chosen a colour cut-off frequency of 2−J fs 2).

We combine the three channels by modifying our magnitude operation from (3.6.11) to now be:

rs=

r+y2r+x2g+y2g+x2b+yb2+b2−b (4.3.7) Wherexr, xg, xbare the real parts of the wavelet response for the red, green and blue channels, andy is the corresponding imaginary part. This only affects the S1 coefficients and theS2

coefficients then are calculated as per (4.3.5).

An alternative to (4.3.7) is to combine the colours before scattering into a luminance channel. However, we choose to use (4.3.7) instead as this has the ability to detect colour edges with constant luminance.

WithJ= 2 the resulting scattering output now has 3 + 12 + 36 = 51 channels at 1/16 the

78 | Visualizing and Improving Scattering Networks

⇤ J Layer above reconstruction

+ ⇥ _magnitudes ⇤eJ + Sm 1 Sm ⇤ J ⇤eJ deeper layers ⇤ phases ⇤e

ScatterNet

DeScatterNet

Order (m 1) Propagated Signal -Um1 ⇤e

⇤

Sm 1

Order (m 1) Reconstructed Signal - ˆUm1

OrdermWavelet Signal -Wm

OrdermWavelet Signal - ˆWm

OrdermPropagated Signal -Um

OrdermReconstructed U - ˆUm kernel filtering kernel filtering Im(W m) Re(W_m₎ Um 1 Im(W m) Re(W_m₎ Um Um Im(W m) Re(W_m₎ 1 m) Um ˆ Um Im(W m) Re(W_m₎ 1 Um ˆ Um Im 1_Wˆm 2 Re1_W_ˆ m2 1 Im(W m) Re(W_m₎ Um ˆ Um Im 1ˆWm 2 Re1_W_ˆ m 2 1

Figure 4.2: The inverse scattering network. Comprises a DeScattering layer (left) attached to a Scattering layer (right). We are using the same convention as [26] Figure 1 - i.e., the input signal starts in the bottom right-hand corner, passes forwards through the ScatterNet (up the right half), and then is reconstructed in the DeScatterNet (downwards on the left half). The DeScattering layer will reconstruct an approximate version of the previous order’s propagated signal. The 2×2 grids shown around the image are either Argand

diagrams representing the magnitude and phase of small regions of complex (De)ScatterNet coefficients or bar charts showing the magnitude of thereal (De)ScatterNet coefficients (after applying the modulus nonlinearity). For reconstruction, we need to save the discarded phase information and reintroduce it by multiplying it with the reconstructed magnitudes.

In document Uses of Complex Wavelets in Deep Convolutional Neural Networks (Page 96-98)