• No results found

A Madaline for Translation-Invariant Pattern Recognition

t Discrete samples

Exercise 2. 5: What logic function is being computed by the single Adaline in

2.4 The Madaline

2.4.3 A Madaline for Translation-Invariant Pattern Recognition

Various Madaline structures have been used recently to demonstrate the appli- cability of this architecture to adaptive pattern recognition having the properties of translation invariance, rotation invariance, and scale invariance. These three properties are essential to any robust system that would be called on to rec- ognize objects in the field of view of optical or infrared sensors, for example. Remember, however, that even humans do not always instantly recognize ob- jects that have been rotated to unfamiliar orientations, or that have been scaled significantly smaller or larger than their everyday size. The point is that there may be alternatives to training in instantaneous recognition at all angles and scale factors. Be that as it may, it is possible to build neural-network devices that exhibit these characteristics to some degree.

Figure 2.20 shows a portion of a network that is used to implement transla- tion-invariant recognition of a pattern The retina is a 5-by-5-pixel array on which bit-mapped representation of patterns, such as the letters of the alphabet, can be placed. The portion of the network shown is called a slab. Unlike a layer, a slab does not communicate with other slabs in the network, as will be seen shortly. Each Adaline in the slab receives the identical 25 inputs from the retina, and computes a bipolar output in the usual fashion; however, the weights on the 25 Adalines share a unique relationship.

Consider the weights on the top-left Adaline as being arranged in a square matrix duplicating the pixel array on the retina. The Adaline to the immediate

Madaline slab

Retina

Figure 2.20 This single slab of Adalines will give the same output (either

+ 1 or -1) for a particular pattern on the retina, regardless of the horizontal or vertical alignment of that pattern on the retina. All 25 individual Adalines are connected to a single Adaline that computes the majority function: If most of the inputs are +1, the majority element responds with a + 1 output. The network derives its translation-invariance properties from the particular configuration of the weights. See the text for details.

2.4 The Madaline 77

right of the top-left pixel has the identical set of weight values, but translated one pixel to the right: The rightmost column of weights on the first unit wraps around to the left to become the leftmost column on the second unit. Similarly, the unit below the top-left unit also has the identical weights, but translated one pixel down. The bottom row of weights on the first unit becomes the top row of the unit under it. This translation continues across each row and down each column in a similar manner. Figure 2.21 illustrates some of these weight matrices. Because of this relationship among the weight matrices, a single pattern on the retina will elicit identical responses from the slab, independent

Key weight matrix: top row, left column Weight matrix: top row, 2nd column

w w w w

12 13 14 15

Weight matrix: 2nd row, left column

W W W W 51 52 53 45 W W W W 22 23 24 25

w

12 W13 35 45 44 W 32 33 34 35 W W 42 43 44 45

Weight matrix: 5th row, 5th column

Figure 2.21 The weight matrix in the upper left is the key weight matrix.

All other weight matrices on the slab are derived from this

matrix. The matrix to the right of the key weight matrix represents the matrix on the directly to the right of the

one with the key weight matrix. Notice that the fifth column of the key weight matrix has wrapped around to become the

first column, with the other columns shifting one space to the right. The matrix below the key weight matrix is the one on the Adaline directly below the Adaline with the key weight matrix. The matrix diagonal to the key weight matrix represents the matrix on the Adaline at the lower right of the slab.

of the pattern's translational position on the retina. We encourage you to reflect on this result for a moment (perhaps several moments), to convince yourself of its validity.

The majority node is a single Adaline that computes a binary output based on the outputs of the majority of the Adalines connecting to it. Because of the translational relationship among the weight vectors, the placement of a particular pattern at any location on the retina will result in the identical output from the majority element (we impose the restriction that patterns that extend beyond the retina boundaries will wrap around to the opposite side, just as the various weight matrices are derived from the key weight Of course, a pattern different from the first may elicit a different response from the majority element. Because only two responses are possible, the slab can differentiate two classes on input patterns. In terms of hyperspace, a slab is capable of dividing

into two regions.

To overcome the limitation of only two possible classes, the retina can be connected to multiple slabs, each having different key weight matrices (Widrow and Winter's term for the weight matrix on the top-left element of each slab). Given the binary nature of the output of each slab, a system of n slabs could differentiate 2" different pattern classes. Figure 2.22 shows four such slabs producing a four-dimensional output capable of distinguishing different input- pattern classes with translational invariance.

Let's review the basic operation of the translation invariance network in terms of a specific example. Consider the letters A P, as the input patterns we would like to identify regardless of their or left-right translation on the 5-by-5-pixel retina. These translated retina patterns are the inputs to the slabs of the network. Each retina pattern results in an output pattern from the invariance network that maps to one of the 16 input classes (in this case, each class represents a letter). By using a lookup table, or other method, we can associate the 16 possible outputs from the invariance network with one of the 16 possible letters that can be identified by the network.

So far, nothing has been said concerning the values of the weights on the Adalines of the various slabs in the system. That is because it is not actually necessary to train those nodes in the usual sense. In fact, each key weight matrix can be chosen at random, provided that each input-pattern class result in a unique output vector from the invariance network. Using the example of the previous paragraph, any translation of one of the letters should result in the same output from the invariance network. Furthermore, any pattern from a different class (i.e., a different letter) must result in a different output vector from the network. This requirement means that, if you pick a random key weight matrix for a particular slab and find that two letters give the same output pattern, you can simply pick a different weight matrix.

As an alternative to random selection of key weight matrices, it may be possible to optimize selection by employing a training procedure based on the

Related documents