4.4 Vulnerabilities of the Provenance Quality and Trust Factors
5.1.5 Combiner Function: Calculating an Overall Total Score of a
5.1.5.1 Comparative Analysis of Different Average Functions
In this section, we will explore the properties of different average functions such as Mean (or Arithmetic Mean), Geometric Mean, Harmonic Mean and Quadratic Mean or Root Mean Square (RMS). Here is how different average values of numbers (e.g. x,y and z) are calculated :
• Mean = (Sum of all Elements in a Set) ÷ (Number of Elements in the Set) e.g. x+y+z
3
• Geometric Mean = n-th root of the product of n numbers e.g. (x×y×z)1/3
• Harmonic Mean = (Number of Elements in a Set) ÷ Sum of Reciprocal of each Element e.g. 3
(1x +1y +1z)
• Root Mean Square = Square-root of the Average of the Square of each Element e.g.
r
(x2+y2+z2)
3
Due to the nature of the mathematical operations performed in each average func- tion, they all produce different results for the same set of values. I have given examples of how the resulting values of different averages vary from each other. Consider the data in Table 5.4.
Table 5.4: Effect of Different Average Functions (a)
Data Series 1 Data Series 2 Data Series 3
5 5 1 5 4 3 5 5 6 5 4 1 5 5 1 5 6 2 5 4 9 5 6 8 5 4 9 5 7 10
Mean G.Mean H.Mean RMS Mean G.Mean H.Mean RMS Mean G.Mean H.Mean RMS
5 5 5 5 5 4.91 4.82 5.10 5 3.44 2.25 6.15
All values in the Data Series 1 are equal (i.e. its Standard Deviation is zero). Hence, their Arithmetic Mean, Geometric Mean and RMS values are all same, which
is 5. Although, all three series of numbers have the same average value, which is 5, the numbers in Data Series 2 have a low standard deviation, whereas the numbers in Data Series 3 have a high standard deviation. As a result, Geometric Mean and Harmonic Mean of the Data Series 2 (4.91 and 4.82 respectively) are greater than those of Data Series 3 (3.44 and 2.25). This indicates that while the Arithmetic Mean allows us to nullify the effect of having a low score by having another equally high score (and vice versa), Geometric and Harmonic mean penalises for having any low score in the series. In fact, Harmonic Mean is tougher than Geometric Mean in penalising for having a low score. This property of Harmonic Mean (and Geometric Mean) is even more evident in Table 5.5 and the corresponding graph in Figure 5.2.
Table 5.5: Effect of Different Average Functions (b)
Series 1 Series 2 Series 3 Series 4 Series 5
5 0.1 0.1 0.1 0.1 5 5 5 3 3 5 8 5 6 6 5 7 4 1 1 5 8 5 1 1 5 9 6 2 2 5 7 4 9 2 5 9 6 8 1 5 7 4 9 2 5 10 7 10 3 A. Mean 5 7.01 4.61 4.91 2.11 R.M.S. 5 7.50 4.94 6.14 2.63 G. Mean 5 4.95 3.39 2.73 1.46 H. Mean 5 0.89 0.85 0.74 0.65
All values except one in the third column (WorldView 2) of Table 5.5 are very high. Therefore, the Arithmetic mean of the values is 7.01 whereas the Harmonic Mean is only 0.89 while Geometric mean is 4.95. If we compare the values of Harmonic Mean in Table 5.5, we can see that having large values has almost no effect on the overall score if there is only one very small value. On the other hand, Root Mean Square has the opposite effect. It rewards more for having the large values than penalising for having small values. This is why WorldView 4 (column 5) has a much bigger RMS value (6.14) than WorldView 3 (4.95), although their Arithmetic Mean is not very different. It is important to note that the RMS value is always greater than Arithmetic Mean while Geometric and Harmonic mean are always less than Arithmetic Mean.
Figure 5.2: Effect of Mean, Geometric Mean, Harmonic Mean and Root Mean Square
In the light of the above discussion, we can justifiably use the RMS function for the values of those factors that we want to amplify and the Geometric Mean for those factors that we want to attenuate. However, this may raise a question that why Geometric Mean has been used instead of Harmonic Mean to attenuate values. The reasons are twofold:
1. Although the Geometric Mean reduces the effect of the large values when a data series contains small values, it does not mask large values as much as the Harmonic Mean does.
2. If a data series contains a zero then the Geometric Mean of the series becomes zero regardless how big the other values in the series are. However, the Har- monic Mean is even worse in handling a zero because it requires computing the reciprocal of the number (i.e. 1/0). This causes a computer to crash. However, it may be necessary to avoid using Geometric Mean at times because Geometric Mean cannot handle negative numbers. In those cases, Harmonic Mean
may be an alternative option. Therefore, it has been suggested in [102] to allow the end user to choose the appropriate mathematical functions that suit their need.
5.1.5.2 Combiner Function
The composite score of a world view is calculated by combining individual scores of different trust factors from all messages included in that world view, as illustrated in Table 5.6 with some example scores. How the scores are combined to produce the overall score for each world view is explained below:
1. Calculate the total score of the messages for each factor separately in the following manner (in any order):
− Calculate the Mean score (Arithmetic Mean) for each of these factors separately: Freshness of Information, Context, and Corroboration
− Depending on the type of incident and the policy adopted, calculate
◦ Arithmetic Mean or Root Mean Square (RMS) of the scores corre- sponding to the factors Identity of Informer and Reputation
◦ Arithmetic Mean or Geometric Mean (G. Mean) of the scores corre- sponding to the factors Social Relation and Competence
Generally, the Arithmetic Mean will be used to work out the overall score for all of these factors. However, in some cases (e.g. sensitive and delicate situations), we may want to give more emphasis to the messages received from some highly trusted sources with good reputation when we are certain about their identity (which may have been listed on a highly trusted list or not on a black list). In those cases, the RMS function will be used instead of Arithmetic Mean. It may be noted that Google Page- Rank algorithm uses similar techniques to work out the page-rank for a website. An external link from a website that has a higher page-rank carries more weight than a link from other sites that have lower page- rank, while other factors are constant [74]. Likewise, Geometric Mean will sometimes be used to attenuate the values instead of Arithmetic Mean.
− Take the Root Mean Square (RMS) of the scores for the factor Location of Informer
2. Calculate the Arithmetic Mean of the aggregate scores (found in step 1) for all factors. This average value (Arithmetic Mean) is the overall score of the world view.
Table 5.6: Calculating Composite Score
MsgId Id Loc Fresh Rep Contx Rel Corro Comp
1 6 7 8 6 0 0 3 4 2 4 6 4 5 2 1 6 4 3 3 7 5 0 0 0 3 5 4 5 8 7 5 2 1 7 6 RMS/ Mean RMS Mean RMS/ Mean Mean Mean/ G.Mean Mean Mean/ G.Mean Overall Aggregate Score =
A. Mean of the Aggregate Scores for each Factor i.e. Arithmetic Mean of the Results found in the previ- ous row
MsgId ≡Message Id, Id ≡ Identity of Informer, Loc ≡Location of Informer, Fresh ≡ Freshness of Information, Rep ≡ Reputation
Contx≡Context/situation Inter- est & Ethics,
Rel ≡ Social Relation, Corro ≡ Corroboration, Comp ≡ Competence,
The reason for using Geometric Mean and Quadratic Mean (Root Mean Square) has already been explained in Section 5.1.5.1. It has also been noted that Geometric Mean makes an overall score zero, if there is a zero among the scores although the rest of the scores are very high. However, it may not be very practical that (due to the Geometric Mean,) the scores of all messages for a specific factor gets entirely masked just because only one of the messages in the world view has a zero score for that factor. Therefore, the user may want to replace all zeros with a very small value (e.g. 0.01 or 0.001) so that the world view receives a reasonably minimised value instead of a zero.
Suppose a world view contains four messages and Table 5.6 shows the scores assigned to those (four) messages corresponding to different trust factors using the scoring function described in Section 5.1.3. Table 5.7 shows the overall total score of the world view following the procedure described above, assuming that this world view is associated with an event (incident), which is not very sensitive or delicate. (Hence, an arithmetic mean is be used to work out the overall score for each of the trust factors.)
Table 5.7: Calculating the Overall Total Score of a World View MsgId Id Loc Fresh Rep Contx Rel Corro Comp
1 6 7 8 6 0 0 3 4
2 4 6 4 5 2 1 6 4
3 3 7 5 0 0 0 3 5
4 5 8 7 5 2 1 7 6
Averages = 4.50 7.04 6.00 4.00 1.00 0.50 4.75 4.75 Overall Aggregate Score = 4.07
If the decision maker, for any reason, wants to give more priority to the location (3 times more, for example), s/he will multiply the corresponding scores of the location factor before calculating the overall score. In that case, the overall score of the world view (calculated from the values listed in Table 5.6 and Table 5.7) will be 5.83, as shown in Table 5.8. Chapter 8 demonstrates the construction and scoring of the world views, which make the whole process clearer.
Table 5.8: Change in Overall Total Score of a World View due to Decision Making Policy
MsgId Id Loc Fresh Rep Contx Rel Corro Comp
1 6 7 ×3 8 6 0 0 3 4
2 4 6 ×3 4 5 2 1 6 4
3 3 7 ×3 5 0 0 0 3 5
4 5 8 ×3 7 5 2 1 7 6
Averages = 4.50 21.12 6.00 4.00 1.00 0.50 4.75 4.75 Overall Aggregate Score = 5.83