5.5 A Monte-Carlo Method for TRAVOS-C
5.6.2 Basic Learning Behaviour
For TRAVOS-C to perform a useful role in evaluating agent performance there are two fundamental hypotheses that should be true, as follows:
1. As a truster gains direct experience of a trustee, its estimation accuracy should improve for both the trustee’s behaviour parameters and the noise parameters of any reputation source that has provided an opinion about that trustee in the past. 2. As the number of observations a reputation source reports for a given trustee increases, the truster’s estimation accuracy should improve for both the trustee’s behaviour parameters and the reputation source’s noise parameters.
To test these hypotheses, we ran a series of experiments during which a truster, atr, was presented with direct observations of a trustee, ate, along with the opinion of a reputation source, arep, about ate. During these experiments, no evidence pertaining to any other trustee or reputation source was made available to atr, and between each experiment, we varied three control variables:
1. the number of direct observations of ate made byatr;
2. the number of observations ofate thatarep claims to have made; and 3. the sum ofarep’s noise variance,σ2arep, andate’s behaviour variance,σ
2
atr,ate. While the first two control variables relate directly to the hypotheses, the third controls the level of difficulty associated with estimation. Standard statistical theory tells us that, to achieve a given estimation accuracy for the parameters of a Gaussian distribution, more observations are required as the variance of the distribution increases13 (DeGroot and Schervish, 2002). As TRAVOS-C estimates agent behaviour using observations assumed to be drawn from Gaussian distributions, its performance should not be immune to this effect.
To ensure that the results obtained apply to a general set of agent behaviours, all other aspects of agent behaviour were varied randomly between each episode. Specifically,
σa2tr,ate and σa2rep were determined by assigning a random proportion of their sum to
σa2tr,ate, with the remaining proportion assigned to σa2rep. This was achieved by gener- ating a random number, P, uniformly distributed on the interval [0.1,0.9], with which the variances were calculated using Equations 5.103 and 5.104. Given these values, the mean parameters, µatr,ate and σ
2
atr,ate, were generated from their conditional prior
−50 0 50 100 150 200 250 300 0 5 10 15 20 25 30
Mean Parameters, 16 Rep Obs
no. direct observations
absolute error prior reputation noise trustee behaviour −50 0 50 100 150 200 250 300 0 1 2 3 4 5 6 7
Variance Parameters, 16 Rep Obs
no. direct observations
absolute error prior reputation noise trustee behaviour −50 0 50 100 150 200 250 300 0 5 10 15 20 25 30
Mean Parameters, 196 Rep Obs
no. direct observations
absolute error prior reputation noise trustee behaviour −50 0 50 100 150 200 250 300 0 1 2 3 4 5 6 7
Variance Parameters, 196 Rep Obs
no. direct observations
absolute error
prior
reputation noise trustee behaviour
Figure 5.7: Parameter estimates with variance sum of 25, varying direct observations.
distributions, as specified in Section 5.6.1.
σa2tr,ate = P(σa2tr,ate+σ2arep) (5.103)
σ2arep = (1−P)(σa2tr,ate+σ2arep) (5.104)
Selected results from these experiments are illustrated in Figure 5.7, in which the number of direct observations is varied along the horizontal axis of each graph, while the number of reported observations is varied between the top and bottom sets of graphs. In addition, Figure 5.8 gives a similar set of results, except that, in this case, the reported observations vary along the horizontal axes, while the direct observations vary between the top and bottom graphs. In each of these figures, the mean estimation errors achieved for the reputation noise mean, µatr,ate, and trustee behaviour mean,µatr,ate, are plotted in the left of the figure, while mean estimation errors for the corresponding variance parameters are plotted to the right. For comparison, the estimation error achieved by the model prior is plotted in each graph, showing how the model performs when it has no direct or reported observations.
These results show that, in general, increasing either direct observations or reputation decreases the mean estimation error for each of the model parameters, which is in agree- ment with the hypotheses stated above. In addition, however, two other notable aspects of behaviour can be observed.
−50 0 50 100 150 200 250 300 0 5 10 15 20 25 30
Mean Parameters, 0 Dir Obs
no. reputation observations
absolute error prior reputation noise trustee behaviour −50 0 50 100 150 200 250 300 1 2 3 4 5 6 7
Variance Parameters, 0 Dir Obs
no. reputation observations
absolute error prior reputation noise trustee behaviour −50 0 50 100 150 200 250 300 0 5 10 15 20 25 30
Mean Parameters, 4 Dir Obs
no. reputation observations
absolute error prior reputation noise trustee behaviour −50 0 50 100 150 200 250 300 1 2 3 4 5 6 7
Variance Parameters, 4 Dir Obs
no. reputation observations
absolute error
prior
reputation noise trustee behaviour
Figure 5.8: Parameter estimates with variance sum of 9, varying reputation observa-
tions.
First, the number of observations required to significantly improve estimates of the mean parameters is less than required for the variance parameters. This property is intuitive, when we consider the Fisher information (DeGroot and Schervish, 2002) associated with a given number of samples from a Gaussian distribution. Essentially, Fisher informa- tion measures the predictive value that each sample has for a given property of its distribution, with higher values indicating more information. In the case of a Gaussian distribution with meanµand varianceσ2, the Fisher information of a single sample from
that distribution is 1/σ2 with respect to µ, and 1/2σ4 with respect to σ2. This tells us that it takes significantly more data to obtain the same level of information about the distribution variance, compared to its mean, which is reflected in our results.
Second, as shown in the top two graphs in Figure 5.8, increasing the number of reported reputation observations can improve performance, even when the number of direct ob- servations is 0. This can be attributed to TRAVOS-C learning the sum of the trustee behaviour and reputation noise parameters. That is, although reputation cannot, on its own, be used to determine the noise associated with a reputation source, we can use it to learn the value of the sums µatr,ate+µarep and σ2atr,ate +σ
2
arep. These, along with any prior information, can provide some indication of the parameter values, particularly in the case of the variances, because we know that, individually, these must be less than their sum and greater than 0. Moreover, when direct observations are available they il- luminate not only a trustee’s behaviour, but also the proportion of a reputation source’s
opinion that is due to noise.