• No results found

Computation of the First Component τ i 0

5.4 Machine Learning-Based Instantiation of the Performance Scoring

5.4.2 Computation of the First Component τ i 0

This section describes how the first component τi0 of the aggregate score τir of player

Pi at round r is computed, which takes into account the ratings submitted by all

players Pj6=i. The computation of the first component τi0 is performed in an Euclidean

space of dimension D = 2. For readability, we divide this computation into steps.

5.4.2.1 Collection of the evidence

Each player Pj submits point Sj,i = (xSj,i, ySj,i) to rate player Pi to the TA. The first

coordinate of point Sj,i is xSj,i = τ

r−1

j , where 0 ≤ τ r−1

j ≤ 1 is the aggregate score of

player Pj at round r − 1. The second coordinate of point Sj,i is ySj,i = ρj,i, where

0 ≤ ρj,i ≤ 1 is the rating submitted by player Pj to rate player Pi.

5.4.2.2 Representation of τi0

Since the ratings with respect to the performance of player Pi is represented as a value

between 0 and 1, the first component τi0 is also a value between 0 and 1. The idea is to define the data set S(i) = {S1,i, . . . , Si−1,i, Si+1,i, . . . , Sn,i} of points submitted

Table 5.1: Summary of the notation used to instantiate the accurate performance scoring mechanism through machine learning techniques.

Pi, Pj players

n total number of players

τir aggregate score of player Pi at round r

τi0i00 first/second component of player Pi at round r

τi(r−1) aggregate score of player Pi at round r − 1

Sj,i data point describing how player Pj rates player Pi

xPj,i/yPj,i x/y-coordinate of data point Sj,i

ρj,i evidence submitted by player Pj with respect to player Pi

C1, . . . , CK clusters/classes of credibility

M1, . . . , MK center points of clusters C1, . . . , CK

yM1, . . . , yMK y-coordinate of M1, . . . , MK

wj,i weight of player Pj when rating player Pi

π1, . . . , πK mixing coefficients of M1, . . . , MK

α, β incentive, penalty for, respectively, accurate and inaccurate ratings

oj,i score gained or loss by player Pj when rating player Pi

a, b, c coefficients for τi0, τi00, and τir−1 respectively

by player Pj6=i. Then, the machine learning techniques of K-means clustering (see

Section 5.4.1.1) and mixture of Gaussians (see Section 5.4.1.2) are used to extract the first component τi0 from coordinate ySj,i of each point in the data set S

(i).

5.4.2.3 Classes of credibility

K classes of evidence are distinguished with respect to their credibility by the K-

means clustering algorithm. Informally, we use the term credibility to underline the fact that these classes are distinguished based on the reputation already gained by the raters, whose ratings are believed to be more accurate, thus, credible. The points in the data set S(i) are grouped into K clusters C

1, . . . , CK. In fact, each point in

the data set S(i) is a tuple corresponding to the values “aggregate score of the rater"

5.4 Machine Learning-Based Instantiation of the Performance Scoring Mechanism

classes which take into account both values. The center points M1, . . . , MK of clusters

C1, . . . , CK simplify these classes of credibility with fewer, yet more informative points.

5.4.2.4 Assigning a weight wj,i to point Sj,i

Each point Sj,i is submitted by player Pj, which has aggregate score at round r − 1

τjr−1. We use this aggregate score to weight point Sj,i. We define weight wj,i as:

wj,i =

F (τjr−1)

Pn−1

j=1F (τjr−1)

,

where F : [0, 1] → R is a positive and increasing step function over subintervals of interval [0, 1], which assigns higher scores to larger aggregate scores τjr−1 at round

r − 1. The meaning of such definition for function F (x) is to simplify the weights

that a point Sj,i can have into fewer possible values. Thus, two aggregate scores

τjr−1 < τkr−1 at round r − 1 are considered equivalent with respect to the weights wj,i

and wk,i for the points Sj,i and Sk,i if they lie on the same subinterval.

5.4.2.5 Computation of the first component τi0

The first component τi0 is computed as a weighted combination of coordinates

yM1, . . . , yMK of the center points M1, . . . , MK, respectively. Center points M1, . . . , MK

are not equivalent: together with the classes of credibility they distinguish, they depend on the cardinality of the respective clusters. The idea is to associate val- ues π1, . . . , πK to center points M1, . . . , MK in quantitative and qualitative manner.

Values π1, . . . , πK are regarded as the mixing coefficients of a mixture p(S(i)) of

K Gaussian distributions N11, σ21), . . . , NK(µK, σK2 ). More precisely, the points

within each cluster Cl can be seen as following a Gaussian distribution with µl = Ml,

for l = 1, . . . , K. That is because the mean and the variance of a Gaussian distribu- tion convey information about where the points are mostly concentrated and how they are spread, which is comparable to the information conveyed by the clusters. Weights wj,i are used to compute the mixing coefficients π1, . . . , πK. In more de-

tail, πl =Pnj=1l wlj,i, where nl is the cardinality of cluster Cl and wlj,i is the weight

assigned to point Slj,i ∈ Cl, for l = 1, . . . , K. Note that

PK

l=1πl = 1 and thus the

mixture p(S(i)) is a probability distribution. The weighted sum of means µ

1, . . . , µK

represents the mean µ of the closets Gaussian distribution N (µ, σ2) approximating

mixture p(S(i)) (see Section 5.4.1.2). And this is exactly what we aim at: since

the means µ1, . . . , µK are the center points M1, . . . , MK, we can now compute the

first component τi0 as the weighted sum of coordinates yM1, . . . , yMK according to the

mixing coefficients π1, . . . , πK. That is,

τi0 =

K X

l=1

Basically, the first component is computed as coordinate yµ of the mean µ of the

fitting Gaussian distribution N (µ, σ2). One can argue that the first component τ0

i

could be computed directly after clusters C1, . . . , CK were distinguished, without

passing through the step of computing the mixture of Gaussians. This is, in fact, what one would practically do when computing τi0. However, we highlight that this computation is possible because the center points of the clusters model are the means of Gaussian distributions.

We recall that the technique proposed to compute the first partial trust value τi(1) for storage server Si assumes an honest majority among the storage servers within

S. Otherwise, the cluster with the highest weight is the one with a high density of untrustworthy storage servers. Note that this is a common assumption within the framework of distributed storage.