Backpropagation
Exercise 3. 5: Consider a three-layer BPN with all weights initialized to the
same value on every unit. Prove that this network will never be able to learn anything. Interpret this result in terms of the error surface in weight space.
3.4 BPN APPLICATIONS
The BPN is a versatile tool that is readily applied to a number of diverse problems. To a large extent, its versatility is due to the general nature of the network learning process. As we discussed in the previous section, there are only two equations needed to backpropagate error signals within the network; which of the two is used depends on whether the processing unit receiving the error signal contributes directly to the output. Those units that do not connect directly to the output use the same error-propagation mechanism regardless of where they are in the network structure.
The generality offered by this common process allows arrangement and connectivity of individual units within the network that can vary dramatically. Similarly, due to the variety of network structures that can be created and trained successfully using the backpropagation algorithms, this network-learning technique can be applied to many different kinds of problems. In the remainder of this section, we will describe two such applications, selected to illustrate the diversity of the BPN network architecture.
L
3.4 BPN Applications 107
3.4.1 Data Compression
As our first example, let's consider the common problem of data compression. Specifically, we would like to try to find a way to reduce the data needed to en- code and reproduce accurately a moderately high-resolution video image, so that we might transmit these images over low- to medium-bandwidth communica- tion equipment. Although there are many algorithmic approaches to performing data compression, most of these are designed to deal with static data, such as ASCII text, or with display images that are fairly consistent, such as computer graphics. Because video data rarely contain regular, well-defined forms (and even less frequently contain empty space), video data compression is a difficult problem from an algorithmic viewpoint.
Conversely, as originally described in a neural-network approach is ideal for a video data-reduction application, because a BPN can be trained easily to map a set of patterns from an space to an space. Since any video image can be thought of as a matrix of picture elements (pixels), it naturally follows that the image can also be conceptualized as a vector in n- space. If we limit the video to be encoded to monochromatic, images can be represented as vectors of elements, each representing the gray-scale value of a single pixel (0 through 255).
Network Architecture for Data Compression. The first step in solving this problem is to try to find a way to structure our network so that it will perform the desired data compression. We would like to select a network architecture that provides a reasonable data-reduction factor (say, four-to-one), while still enabling us to recover a close approximation of the original image from the encoded form. The network illustrated in Figure 3.7 will satisfy both of these requirements.
At first glance, it may seem unusual that the proposed network will have a one-to-one correspondence between input and output units. After all, did we not indicate that data compression was the desired objective? On further investigation, the strategy implied by the network architecture becomes appar- ent; since there are fewer hidden units than input units, the hidden layer must represent the compressed form of the data. This is exactly the plan of at- tack.
By providing an image vector as the input stimulation pattern, the network propagate the input through the hidden units to the output. Since the hidden layer contains only one-quarter of the number of processing units as the input layer, the output values produced by the hidden-layer units can be thought as the encoded form of the input. Furthermore, by propagating the output of the hidden-layer units forward to the output layer, we have implemented a mechanism for reconstructing the original image from the encoded form, as well as for training the network.
During training, the network will be shown examples of random pixel vec- taken from representative video images. Each vector will be used as both
Figure 3.7 This BPN will do four-to-one data compression.
the input to the network and the target output. Using the backpropagation process, the network will develop the internal weight coding so that the im- age is compressed into one-quarter of its original size at the outputs of the hidden units. If we then read out the values produced by the hidden-layer units in our network and transmit those values to our receiving station, we can reconstruct the original image by propagating the compressed image to the output units in an identical network. Such a system is depicted in Fig- ure 3.8.
Network Sizing. There are two problems remaining to be solved for this ap-
plication: the first is the network-sizing problem, and the second is the gener- ation of the sample data sets needed to train the network. We will address the network-sizing aspect first.
is unrealistic to expect to create a network that will contain an input unit for every pixel in a single video image. Even if we restricted ourselves to the relatively low resolution of the 525-line scan rate specified by the National Television Standard Code (NTSC) for commercial television, our network would have to have 336,000 input units (525 lines x 640 pixels). Moreover, the entire network would contain roughly 750,000 processing units (336000 + 336000/4 + 336000) and 50 billion connections. As we have mentioned in earlier chapters, simulating a network containing a large number of units and a vast number
3.4 BPN Applications 109
Transmission medium (cable)
Figure 3.8 In this example, the output activity pattern from the second
layer of units is transferred to a receiving station, where it is applied as the output of another layer of units that forms the top half of the three-layer network. The receiving network then reconstructs the transmitted image from the compressed form, using the inverse mapping function contained in the connection weights to the top half of the network.
of connections on anything less than a dedicated supercomputer is much too time-consuming to be considered practical.
Our sizing strategy is therefore somewhat less ambitious; we will restrict the size of our input and output spaces to 64 pixels. Thus, the hidden layer will contain 16 units. Although this might appear to be a significant compromise over the full image network, the smaller network offers two practical benefits
its larger counterpart:
* It is easy to simulate on small computers.
Training the Network. By now, it is probably evident why the smaller net- work is easier to simulate. Why obtaining training data for the smaller network is easier is probably not as obvious. If we consider the nature of the application the network is attempting to address, we can see that the network is trying to learn a mapping from to and the inverse mapping back to Since the number of possible permutations of the input pattern is sig- nificantly smaller in 64 dimensions than it is in 336,000 dimensions, it follows that far fewer random training patterns are needed for the network to learn to reproduce the input in 64-space.2 It is also easier to generate random images
for training the network by using 64 inputs, because a single complete video image can be subdivided into about 5000 pixel matrices, each of which can be used to train the network.
Based on these observations, our approach of downsizing the network has solved both of the remaining issues. We now have a network that is easy to manage and that offers a means of acquiring the training data sets from readily obtainable sources.
Exercise 3.6: Assume that the system described in this section has been built