Learning Cooperatively with Information Sharing

6.7 Results for Chapter 5

6.7.1 Learning Cooperatively with Information Sharing

The performance of the distributed cooperative learning strategy is investigated in this section. For the learning and recognition of gestures the multi-class SVM classifier is adopted which is trained using hand-crafted features.

6.7.1.1 Experimental Scenario

Each simulation run starts at the beginning of the initial learning phase in which every robot is initialized with an empty training set and placed at a random po- sition. Initially, sini t = 5 samples are acquired from each of the K = 6 finger

count gestures by every robot. In this way, a total of Ti = 5 × 6 = 30 samples

are acquired by every robot for the initial training phase (see Section 5.3.1.1). After hand-crafted features are computed from the acquired T_i samples, each

robot broadcasts a subset of the acquired samples (i.e., computed feature vec- tors). The value of parameter B ∈ [0, 1] defines the fraction of newly acquired samples which are disseminated by a robot. As an example, for B = 0 robots do not communicate. For B = 1 robots exchange all acquired data and share the same training set at any moment in the simulation. For B = 0.1 each robot shares 0.1× 30 = 3 training samples. When B ∈ (0, 1) the samples to be shared are selected according to one of the three strategies presented in Section5.4.2.1. After information sharing is complete, the training set of each robot contains K M+ BK M(N − 1) training samples, where N represents the number of robots. As an example, for B = 0.2 robot r in a swarm of N = 10 robots will have 84 samples in its training set Tr: 24 samples in T_rp (still unknown to the rest of the

swarm), 6 samples in T_rs(already disseminated to the rest of the swarm), and 54 samples in Te

r (received from the other robots in the swarm). The 60 samples in

T_rs∪ Te

r represent the current common knowledge of the swarm.

When the initial training phase is complete the interaction rounds begin (see Section 5.3.1.2), in which 150 random gesture commands are given by the human to the swarm. Before each gesture command is given, the positions of the robots with respect to the gesture are randomized to simulate a realistic scenario, since in between commands robots perform their own tasks which causes them to be randomly scattered in the environment. By means of the cooperative recognition protocol in Section3.4, the swarm converges to a decision for the gesture which can be correct or wrong. In both cases, each robot in the swarm acquires the correct label for the given gesture using full feedback from the human and adds the related information to the subset Tp

r of its training set. After every 10 in-

teraction rounds robots exchange B×10 training samples (selected within the Tp r

subset, which may include samples acquired during previous interaction rounds but not samples which have already been disseminated).

When information sharing is complete the value of parameter R determines if and when robots need to forget some of the training samples in order to limit the size of the training set. Parameter R represents the maximum number of training samples that a robot can retain. If the current size of the training set for a robot exceeds R, one of the sample forgetting strategies in Section5.4.2.2is iteratively applied to reduce/shrink the size of the training set (number of training samples) to be exactly R. Finally, all robots retrain their classifiers (i.e., update classifiers with new information) and a new interaction round starts.

The average classification accuracy of the swarm is computed after every 10 interaction rounds (see Section 5.3.1.2). In this way, for a full simulation run of 150 gesture commands, we obtain 15 accuracy values measured at different stages of the cooperative learning process. The first value corresponds to the

first set of 10 commands in the first interaction round, which is obtained using classifiers trained during the initial learning phase (see Section 5.3.1.1). Initial classifiers are trained using Ti = 30 samples. Subsequent values correspond to

incrementally larger training sets, until the maximum training set size (parame- ter R) is reached. For each set of simulation parameters, 50 simulation runs are performed using different realizations of random variables in the dataset.

Figure 6.25. Average swarm accuracy vs. number of interaction rounds. Top: Accuracy curves for different swarm sizes with robots sharing B _{= 25% of their} samples using representativity-driven selection. Bottom: Accuracy curves for N = 13 robots corresponding to different communication loads (different per- centages of samples shared among robots) B_{= {0%, 20%, 40%, 100%} with ran-} dom selection of samples. Grey bands correspond to confidence intervals.

6.7.1.2 Effect of Swarm Size and Amount of Shared Information

The learning curves for different swarm sizes and different amounts of exchanged information (e.g., 20% communication means that every robot only shares 20% of its personal samples) are reported in Figure6.25. As expected, larger swarms yield a significantly better accuracy in all stages of the learning (training) process. Two factors contribute to this effect:

• When B > 0, large size swarms are trained much faster than small size swarms. This is because a large swarm collectively acquires and exchanges a proportionally larger amount of training samples. A single robot or a swarm which does not exchange training data (see the curve with B= 0% in Figure6.25(bottom)) learns very slowly.

• When recognizing a gesture, large swarms enjoy a more powerful consen- sus ability as more observations are accounted for.

The contribution of the former factor is explored in Figure 6.25 (bottom), which shows how communication improves the learning ability of a swarm of N = 13 robots. The latter factor is isolated when comparing the bottom curves of both plots in Figure 6.25. In both cases (bottom curves), no communication is allowed and each robot learns independently from the rest of the swarm. The only difference among the two scenarios is given by the size of the swarm which affects accuracy due to the different amount of data acquired during the con- sensus phase. As expected, the 13-robot swarm in Figure 6.25 (bottom) with B= 0% is more accurate than the single robot in Figure6.25(top).

The results in Figure 6.25 (bottom) report that, the larger the amount of communication, the better is the swarm-level accuracy. After a very large number of interaction rounds, the training sets of all robots become so large that no further increase in accuracy is possible. At this point, all scenarios in Figure6.25

(bottom) are expected to yield the same accuracy.

6.7.1.3 Effect of Selection and Sample Sharing Strategies

The effect of the three strategies for selecting training samples to disseminate (see Section 5.4.2.1) is reported in Figure 6.26(top). Giving priority to novel samples results in a performance which is comparable to purely random selection for almost the entire training process. On the other hand, giving priority to the most representative samples leads to a significantly faster learning rate especially during initial training phase. This is due to the fact that a representa- tive samplesummarizes multiple samples as it lies near to their centroid. In this

context, representativity-driven selection can be more informative compared to the typical characteristics of a given class. Conversely, novel samples appear to be more useful later on in the learning process due to their contribution in refining the decision boundaries of the classifiers.

Figure 6.26. Top: Effect of using different selection strategies for sharing samples with N _{= 13 robots, F = 30 features, B = 50% communication, and R = ∞} (robots never forget samples). Bottom: Accuracy using representativity-driven selection with N = 13 robots and different number of features.

6.7.1.4 Impact of Number of Features in Bandwidth-limited Scenarios

Communication constraints are the main reason for limiting the amount of information in the training samples which are shared among robots in a swarm. As training samples comprise of feature vectors with their respective GT classes, the

dimensionality of the features vectors (i.e., feature space) to be exchanged is an important parameter. Larger feature vectors produce more powerful classifiers, but at the same time require more bandwidth for dissemination. In bandwidth- limited scenarios, a trade-off emerges as using more features implies disseminat- ing less training samples, which has a negative impact on learning rate of the swarm. This trade-off is investigated in Figure6.26(bottom) which reports the swarm learning curves when using different feature vector sizes.

Each robot has the opportunity to disseminate a fixed amount of information corresponding to a total of F = 120 features after a set of 12 gestures is given (i.e., approximately 500 bytes assuming single-precision floating point represen- tation). In Figure 6.26(bottom) it is observed that, when small feature vectors are used (F = {2, 5, 10}) all acquired samples need to be shared among robots in the swarm. However, the individual classifiers are still not powerful enough and yield relatively poor recognition accuracy. If F = 120 features are used, each robot can only disseminate1/_12th_{of the acquired samples and the size of the local}

training sets increases at a much slower rate which results in slower learning. In general, relatively small feature vectors (F = {10, 20}) lead to better accuracy during the initial training stage as they allow to quickly build moderately- sized training sets. However, in the later training stage classifiers are not powerful enough to exploit the expanding size of the training set. Instead, relatively large feature vectors (F = {40, 60}) lead to suboptimal results at the beginning but are able to fully exploit the larger training sets accumulated in the later stages of the cooperative learning process. Intermediate feature values (F = {30, 40}) lead to nearly-optimal swarm accuracy in all learning stages (see Section6.7.3).

6.7.1.5 Impact of Strategies for Forgetting and Removing Samples

One key requirement for real-time learning is to ensure that the retraining time of a classifier remains manageable. A simple approach is to limit the maximum number of retained training samples (parameter R). In Section5.4.2.2three simple strategies are presented that iteratively select the samples to be removed (forgotten) from the training database. The quantitative results of the swarm-level accuracy and the SVM retraining time (computed on the Foot-bots) is reported in Figure6.27which evaluates the effects of the three sample forgetting strategies. In this experiment the same setup used in Figure 6.25 is considered with N = 13 robots sharing all acquired samples (B = 100%). As we are interested in the long-term behaviour, we consider a snapshot after 150 interaction rounds are performed. At this point, a robot which did not forget any sample holds around 2000 training samples, and on the Foot-bot platform the SVM retraining

time amounts to nearly 5 minutes for the rightmost data point. With the large amount of training samples, swarm accuracy reaches to 81.5%.

The forgetting strategies indicate that, selecting samples to forget using either representativity-driven or redundancy-driven criteria is detrimental to swarm accuracy when compared to random selection. This result is opposite to that obtained for selecting samples to be shared, where random selection is clearly not the best approach (see Figure 6.26(top)). This due to the fact that, over time these approaches incrementally bias the training dataset which fails to remain representative of the classification problem to be solved. To forget samples, random selection results in a system which pays a minimal penalty in terms of the swarm accuracy while enjoying a fast retraining time. With R= 500, the swarm accuracy decreases marginally to 79.6% but the training time does not exceed 30s compared to the case with R= 2000 = ∞ samples.

Figure 6.27. Swarm accuracy ( y-axis) vs. SVM retraining time on the Foot- bot platform (x-axis, logarithmic) after 150 interaction rounds, N = 13 robots, F = 30 features, and B = 100% communication. Different amounts of samples maintained in the training set (R= {100, 200, 500, ∞}) along with three differ- ent strategies for forgetting samples. For R = ∞ samples are never forgotten. Vertical error bars report confidence intervals on the accuracy.

In document Symbiotic interaction between humans and robot swarms (Page 172-178)