5.4 Machine Learning-Based Instantiation of the Performance Scoring
5.4.2 Computation of the First Component τ i 0
This section describes how the first component τi0 of the aggregate score τir of player
Pi at round r is computed, which takes into account the ratings submitted by all
players Pj6=i. The computation of the first component τi0 is performed in an Euclidean
space of dimension D = 2. For readability, we divide this computation into steps.
5.4.2.1 Collection of the evidence
Each player Pj submits point Sj,i = (xSj,i, ySj,i) to rate player Pi to the TA. The first
coordinate of point Sj,i is xSj,i = τ
r−1
j , where 0 ≤ τ r−1
j ≤ 1 is the aggregate score of
player Pj at round r − 1. The second coordinate of point Sj,i is ySj,i = ρj,i, where
0 ≤ ρj,i ≤ 1 is the rating submitted by player Pj to rate player Pi.
5.4.2.2 Representation of τi0
Since the ratings with respect to the performance of player Pi is represented as a value
between 0 and 1, the first component τi0 is also a value between 0 and 1. The idea is to define the data set S(i) = {S1,i, . . . , Si−1,i, Si+1,i, . . . , Sn,i} of points submitted
Table 5.1: Summary of the notation used to instantiate the accurate performance scoring mechanism through machine learning techniques.
Pi, Pj players
n total number of players
τir aggregate score of player Pi at round r
τi0/τi00 first/second component of player Pi at round r
τi(r−1) aggregate score of player Pi at round r − 1
Sj,i data point describing how player Pj rates player Pi
xPj,i/yPj,i x/y-coordinate of data point Sj,i
ρj,i evidence submitted by player Pj with respect to player Pi
C1, . . . , CK clusters/classes of credibility
M1, . . . , MK center points of clusters C1, . . . , CK
yM1, . . . , yMK y-coordinate of M1, . . . , MK
wj,i weight of player Pj when rating player Pi
π1, . . . , πK mixing coefficients of M1, . . . , MK
α, β incentive, penalty for, respectively, accurate and inaccurate ratings
oj,i score gained or loss by player Pj when rating player Pi
a, b, c coefficients for τi0, τi00, and τir−1 respectively
by player Pj6=i. Then, the machine learning techniques of K-means clustering (see
Section 5.4.1.1) and mixture of Gaussians (see Section 5.4.1.2) are used to extract the first component τi0 from coordinate ySj,i of each point in the data set S
(i).
5.4.2.3 Classes of credibility
K classes of evidence are distinguished with respect to their credibility by the K-
means clustering algorithm. Informally, we use the term credibility to underline the fact that these classes are distinguished based on the reputation already gained by the raters, whose ratings are believed to be more accurate, thus, credible. The points in the data set S(i) are grouped into K clusters C
1, . . . , CK. In fact, each point in
the data set S(i) is a tuple corresponding to the values “aggregate score of the rater"
5.4 Machine Learning-Based Instantiation of the Performance Scoring Mechanism
classes which take into account both values. The center points M1, . . . , MK of clusters
C1, . . . , CK simplify these classes of credibility with fewer, yet more informative points.
5.4.2.4 Assigning a weight wj,i to point Sj,i
Each point Sj,i is submitted by player Pj, which has aggregate score at round r − 1
τjr−1. We use this aggregate score to weight point Sj,i. We define weight wj,i as:
wj,i =
F (τjr−1)
Pn−1
j=1F (τjr−1)
,
where F : [0, 1] → R is a positive and increasing step function over subintervals of interval [0, 1], which assigns higher scores to larger aggregate scores τjr−1 at round
r − 1. The meaning of such definition for function F (x) is to simplify the weights
that a point Sj,i can have into fewer possible values. Thus, two aggregate scores
τjr−1 < τkr−1 at round r − 1 are considered equivalent with respect to the weights wj,i
and wk,i for the points Sj,i and Sk,i if they lie on the same subinterval.
5.4.2.5 Computation of the first component τi0
The first component τi0 is computed as a weighted combination of coordinates
yM1, . . . , yMK of the center points M1, . . . , MK, respectively. Center points M1, . . . , MK
are not equivalent: together with the classes of credibility they distinguish, they depend on the cardinality of the respective clusters. The idea is to associate val- ues π1, . . . , πK to center points M1, . . . , MK in quantitative and qualitative manner.
Values π1, . . . , πK are regarded as the mixing coefficients of a mixture p(S(i)) of
K Gaussian distributions N1(µ1, σ21), . . . , NK(µK, σK2 ). More precisely, the points
within each cluster Cl can be seen as following a Gaussian distribution with µl = Ml,
for l = 1, . . . , K. That is because the mean and the variance of a Gaussian distribu- tion convey information about where the points are mostly concentrated and how they are spread, which is comparable to the information conveyed by the clusters. Weights wj,i are used to compute the mixing coefficients π1, . . . , πK. In more de-
tail, πl =Pnj=1l wlj,i, where nl is the cardinality of cluster Cl and wlj,i is the weight
assigned to point Slj,i ∈ Cl, for l = 1, . . . , K. Note that
PK
l=1πl = 1 and thus the
mixture p(S(i)) is a probability distribution. The weighted sum of means µ
1, . . . , µK
represents the mean µ of the closets Gaussian distribution N (µ, σ2) approximating
mixture p(S(i)) (see Section 5.4.1.2). And this is exactly what we aim at: since
the means µ1, . . . , µK are the center points M1, . . . , MK, we can now compute the
first component τi0 as the weighted sum of coordinates yM1, . . . , yMK according to the
mixing coefficients π1, . . . , πK. That is,
τi0 =
K X
l=1
Basically, the first component is computed as coordinate yµ of the mean µ of the
fitting Gaussian distribution N (µ, σ2). One can argue that the first component τ0
i
could be computed directly after clusters C1, . . . , CK were distinguished, without
passing through the step of computing the mixture of Gaussians. This is, in fact, what one would practically do when computing τi0. However, we highlight that this computation is possible because the center points of the clusters model are the means of Gaussian distributions.
We recall that the technique proposed to compute the first partial trust value τi(1) for storage server Si assumes an honest majority among the storage servers within
S. Otherwise, the cluster with the highest weight is the one with a high density of untrustworthy storage servers. Note that this is a common assumption within the framework of distributed storage.