• No results found

8. Experiments in Embedded Learning and Adaptation

8.1. Experiments in Learning Enhanced State Recognition

8.1.2. The Activity of an Enhanced AFSM

Figure 8.6 shows the output clusterings generated by the Kohonen self-organising network layer of the associative network in the BallPark AFSM. It compares the input vector, which consists of both basic rule inputs and the extra auxiliary inputs, with the output of the Kohonen layer. It can be seen that the Kohonen outputs are clearly different for each of the AFSM states as presented by the basic mle. It was these outputs that were presented to the Grossberg layer. It can be seen that in this activity, the first part of the associative network performed completely adequately.

Figure 8.7 shows a series of BallPark outputs for sensor transition over the test surface. The left column of graphs show the development of the association network output in comparison with the AFSM mle output. It can be seen that the network output eventually develops a similar response to the mle but differs in small ways, particularly in the speed of state transition and filtering out of transitive state changes. The right-hand column shows the merged AFSM output compared to that of the mle output. Here it can be seen that, as expected, the mle output dominates in the early stages. However, by the end the output has become that of the network output. It can be seen clearly there that the an association has caused the early indication of state 4 on two occasions, characterised by spikes of the thin black curve at around samples 250 and 380.

Q u i e s c e n t s t a t e A Near linear s ta te B 1.5j 05l#, . ■ , - 1.5 0.5-i i o < ^ r ^ o o ' 7 > o ^ < N j r > ' ^ m o D ^ - c o < n o F oldback s ta te C 2.5-r 2- - 1.5-- 1- - 0.5-- 0- - i III H 11 i i i - H ) H n I i I i III M 11 I t I )i II 111111111 Ii -1.5-L 2.5-r 2- 1.5-- 4],5- r I '" " = £ü Î2 s « ss S2 52 ° 0.5-- O CO CO <7> Osl IT) CO Foldback s ta te D 2.5 2 1.54- CM O lO U> r - 0 0 <T> O ^ J J o > i£2 CD CO 02 O 0.5--

I

o Çi

1 1 1 1 1 1 111I I 111 I

2 ÇO 22 i

11 I

i r ' - o c o c o c o c s i u ^ c o

Figure 8.6. Each of the four photomultiplier states as reflected by the BallPark inputs. The graph at left shows the BallPark input vector with the 4 basic inputs leftmost and 16 meta-pixel inputs. The right-

hand graph is the mapping output of the Kohonen self-organising network.

WTST15: Dot lunclton octtvcrtton and ipdot*. n«*ghbour g ro w updat*. Bool grouped output (d2« - 7. IW) . Looming

fimo toctof 500.

AfSM-OP9t 16 Mota-ptxoJ and 4 bo*tc Input*. 64 Kohonon unit*. 0 cycle*. Net output wetgttt: 0

: : = r T n z j

J

5 5 S- 3- S" S' S

27150 cyclM. Net output weight: 0.201

3 i S ' 3 ' S '

:,jSf

2V865 cycle*. Net output weight: 0.501

3 i S ' 3 ' S ' S ' S ' S ' 3 ' S ' â

S 3

5 a

3 i S ' 3 ' S ' S ' S ' S '

32560 cycle*. Net output weight; I

1 ' S ' S ' S ' - â

Figure 8.7. State value output plots for the BallPark AFSM. Left: Comparing basic rule output with associative network output at all times and Right: Comparing the combined rule-net output with the basic rule output. Thick lines are basic rule output, thin are network output (left) / AFSM output (right).

Experiments in dealing with unusual circumstances: Altering the test surfaces

The results of switching from the rule-based action-state identification to that of the network are shown in the various graphs of figure 8.8 for a number of different surfaces. The first test involved the comparison of the two trained sessions with that of the completely reactive system of chapter 5. The first training set used the broken-surface sample of drum 3 for the training period; the second used the continuous but varying surface of drum 2. After training, the system was tested on other drum surfaces in order to asses robustness when confronted with unusual circumstances.

Ac NecrLin R einfacement B; B dip ok R einfacement

a " '

C Mecn Seen Level D: EHT V dtoçpStcndadD eviofioi

•2 50

y 40

i 30 z2 80 100

uj 40

a '

R eoctive only

D

Broken surface leaning B T v o tc n e surface leaning Figure 8.8. Graph A: NearLin AFSM state output, percentage of time that scan signal was in an acceptable range of the control set point (+/- 0.6V). Graph B: BallPark AFSM state output, percentage of time in near-linear state. Graph C: Mean level of the scan signal for each surface in the test. Graph D:

Standard deviation of the EHT voltage output showing controller activity.

Graph B compares the percentage of time that the BallPark AFSM was in the near-linear output state (i.e. accumulating reinforcement value) for various surface samples and for the two leaming regimes outlined in the experimental description above. It can be seen that there is in fact little difference between the systems that were trained on different surfaces. It can also be seen that there is only a small trend towards the leamed systems having higher percentages of time in the near-linear state. The fact to note about this graph is that the leamed response is not significantly worse than the reactive m le-based system. Examination of the NearLin output state in graph A, however, shows a more marked difference. Despite the fact that the N earLin AFSM did not support an input association network, it can be seen that the percentage of time for which

it was in the signal-in-range state increased markedly for the learned responses. This may be attributed to the more timely action selection of the BallPark AFSM.

The performance at the system level can be seen most clearly by comparing the standard deviations of the control-voltage outputs of the EH TO utput AFSM in graph D of figure 8.8. It is apparent that the standard deviation of the basic mle-controlled system is significantly larger than the enhanced A FSM outputs in all cases. This suggests an improved stability of the enhanced

BallPark-controWed photomultiplier system as a result of reduced actuation from more

appropriate action selection. In other words, a more appropriate classification of system state. Further analysis of the mean scan signal level, however, provides little further indication of the state of affairs, although the small increases in mean level for the broken and two-tone surfaces again indicate an improvement in the enhanced BallPark systems. The mean scan-level is markedly lower for the two broken-surface types, but this is due to the fact that the broken gaps inherently reduce the mean from that of the desired control set-point as a result of the scan signal spending considerable periods in the quiescent state as the gaps pass by.

Experiments in Robustness: Reducing A FSM Inputs

A; Mecn Seen Level

q 30

B : E H T Voltage S tcndcrd Deviction

^ 300

Figure 8.9. System performance with reduced BallPark AFSM inputs. Graph A shows that the mean scan-level remains more or less unaffected. Graph B: Standard deviations of the increasing EHT voltage

show that the system gradually becomes more unstable as the inputs are reduced.

The graphs in figure 8.9 show the relative performances of the various tests in input reduction for the series of experiments on the single enhanced AFSM. Firstly it should be noted that when the basic rule inputs of the BallPark AFSM were removed, the system's performance was not significantly reduced. This indicates that the network's auxiliary inputs have indeed been assimilated into the network and that it is not simply providing a retranslation of the basic rule. In

fact, it can be seen that as the inputs to the network were progressively removed (by setting them to zero, see above) the degradation in system stability, as indicated by the standard deviations of the EHT voltage, was remarkably gradual with even a minimum of two network inputs proving to be sufficient for stable control. The reason for this apparent impressive robustness, however, results from an artificial construct in the structure of the experiment. It was the case that the basic inputs to the N earLin AFSM were the same as those used for BallPark and during the various stages of the experiment the NearLin inputs were not disabled in the same way. Therefore

the NearLin this process continued to act unimpaired, providing a subsuming controlled output

despite the gradual degradation of BallPark output. Because of this, the experiments in the next section provide a better indication as to the robustness of the system when faced with disappearing input state information. More detail will be provided there.

Summary

In general it can be seen from these experiments that a distinct, if not massive, improvement in system behaviour was gained through the use of an enhanced BallPark AFSM . While the statistical evidence may not seem to be overly impressive, it can be seen more clearly in scan- level/EHT-voltage state-space plots in figure 8.10 of the actual scan signal and EH T control data that, after leaming, a more stable response is achieved. The measurement of the preferred output state or reinforcement level for each of the AFSMs (see section 8.1: Assessing the Results) did not account for the frequency of state transition. A system may spend time oscillating between two states and still accm e a similar reinforcement as a smoother-acting system. This in itself is perhaps indicative of two things: (i) That the reinforcement functions used as measures in these experiments were too simplistic and (ii) that the ever-present problem of credit assignment ([Barto90] and [M inskyôl]) applies as much at the level of AFSM processes in a parallel control architecture as it does at higher levels. This is discussed further in the final section of the present chapter.

A: R eactive B: After Lecrning

^ 200 180 § 160 U 140 -- § 120 ^ 100 80 60 ^ 40 o 20 ^ 0 ^ 200 180 -- § 160 -- U 140 -- 9 120- ± 100 80 g - ^ 60 § 40 - “ 500 1000 1500 2000 EHT voltage 500 1000 1500 EHT v d t c g e 2000

Figure 8.10. EHT control output plots for A: A reactive system on the broken surface and B: After the learning phase has passed, again on the same broken surface.