t Discrete samples
Exercise 2. 3: Determine an explicit equation for as a function of and using the example in the text Use it to find the optimum weight vector,
2.2 and the Adaptive Linear Combiner 65 • What is the appropriate value for
• How do we determine when to stop training?
The answers to these questions depend on the specific problem being addressed, so it is difficult to give well-defined responses that apply in all cases. Moreover, for a specific case, the answers are not necessarily independent.
Consider the dimension of the weight vector. If there are a well-defined number of from multiple there would be one weight for each input. The question would be whether to add a bias weight. Figure depicts this case, with the bias term added, in a somewhat standard form that shows the variability of the weights, the error term, and the feedback from the output to the weights. As for the bias term itself, including it sometimes helps convergence of the weights to an acceptable solution. It is perhaps best thought of as an extra degree of freedom, and its use is largely a matter of experimentation with the specific application.
A situation different from the previous paragraph arises if there is only a single input signal, say from a single electrocardiograph (EKG) sensor. For
= 1 (bias input)
desired output
Figure 2.13 This figure shows a standard diagram of the ALC with multiple
inputs and a bias term. Weights are indicated as variable resistors to emphasize the adaptive nature of the device.
Calculation of the error, is shown explicitly as the addition of a negative of the output signal to the desired output value.
example, an ALC can be used to remove noise from the input signal in order to give a cleaner signal at the output. In a case such as this one, the ALC is arranged in a configuration known as a transverse filter. In this configuration, the input signal is sampled at several points in time, rather than from several sensors at a single time. Figure shows the ALC arranged as a transverse filter.
For the transverse filter, each additional sample in time represents another degree of freedom that can be used to fit the input signal to the desired output signal. Thus, if you cannot get a good fit with a small number of samples, try a few more. On the other hand, if you get good convergence with your first
Figure 2.14 In an ALC arranged as a transverse filter, the
samples are provided by n— 1, presumably equal, time delays, The ALC sees the signal at the current time, as well as its value at the previous n - 1 sample times. When data is
initially applied, remember to wait at least for data to be
2.2 Adaline and the Adaptive Linear Combiner 67
choice, try one with fewer samples to see whether you get a significant speedup in convergence and still have satisfactory results (you may be surprised to find that the results are better in some cases). Moreover, the bias weight is probably superfluous in this case.
Earlier, we alluded to a relationship between training time and the dimension of the weight vector, especially for the software simulations that we consider in this text: More weights generally mean longer training times. This equation must be constantly balanced against other factors, such as the acceptability of the solution. As stated in the previous paragraph, using more weights does not always result in a better solution. Furthermore, there are other factors that affect both the training time and the acceptability of the solution.
The parameter is one factor that has a significant effect on training. If is too large, convergence will never take place, no matter how long is the training period. If the statistics of the input signal are known, it is possible to show that the value of is restricted to the range
0
where is the largest eigenvalue of the matrix R, the input correlation matrix discussed in Section Although it is not always reasonable to expect these statistics to be known, there are cases where they can be estimated. The text by Widrow and Stearns contains many examples. In this text, we propose a more heuristic approach: Pick a value for such that a weight does not change by more than a small fraction of its current value. This rule is admittedly vague, but experience appears to be the best teacher for selecting an appropriate value for
As training proceeds, the error value will diminish (hopefully), resulting in smaller and smaller weight changes, and, hence, in a slower convergence toward the minimum of the weight surface. It is sometimes useful to increase the value of during these periods to speed convergence. Bear in mind, however, that a larger may mean that the weights might bounce around the bottom of the weight surface, giving an overall error that is unacceptable. Here again, experience is necessary to enable us to judge effectively.
One method of compensating for differences in problems is to use normal- ized input vectors. Instead of use • Another tactic is to scale the desired output value. These methods help particularly when we are selecting initial weight values or a value for In most cases, weights can be initialized to random values of small real between -1.0 and The value of is usually best kept significantly less than 1; a value of or even 0.05 may be reasonable for some but values considerably less may be required. The question of when to stop training is largely a matter of the requirements on the output of the system. You determine the amount of error that you can on the output signal, and train until the observed error is consistently less than the required value. Since the mean squared error is the value used to derive the training algorithm, that is the quantity that usually determines when
a system has converged to its minimum error solution. Alternatively, observing individual errors is often necessary, since the system performance may have a requirement that no error exceed a certain amount. Nevertheless, a mean squared error that falls as the iteration number increases is probably your best indication that the system is converging toward a solution.
We usually assume that the input signals are statistically stationary, and, therefore, is essentially a constant after the optimum values have been determined. During training, will hopefully decrease toward a stable solution. Suppose, however, that the input signal statistics change somewhat over time, or undergo some discontinuity: Additional training would be required to compensate.
One way to deal with this situation is to cease or resume training con- ditionally, based on the current value of If the signal statistics change, training can be reinitiated until is again reduced to an acceptable value. This method presumes that a method of error measurement is available.
Provided that the input signals are statistically stationary, choosing the num- ber of input vectors to use during training may be relatively simple. You can use real, inputs as training vectors, provided that you know the desired output for each input vector. If it is possible to identify a sample of input vectors that adequately reproduces the statistical distribution of the actual inputs, it may be possible to train on this set in a shorter time. The accuracy of the training depends on how well the selected set of training vectors models the distribution of the entire input signal space.
The other, related question is how to go about determining the desired output for a given input vector. As with many questions discussed in this section, this depends on the specific details of the problem. Fortunately, for some problems, knowing the desired result is easy compared to finding an algorithm for transforming the inputs into the desired result. The ALC will often solve the difficult part. The "easy" part is left to the engineer.
Exercise 2.4: A lowpass filter can be constructed with an Adaline having two