Computation of the First Component τ i 0 - Machine Learning-Based Instantiation of the Performa

5.4 Machine Learning-Based Instantiation of the Performance Scoring

5.4.2 Computation of the First Component τ i 0

This section describes how the first component τ_i0 of the aggregate score τ_ir of player

Pi at round r is computed, which takes into account the ratings submitted by all

players Pj6=i. The computation of the first component τi0 is performed in an Euclidean

space of dimension D = 2. For readability, we divide this computation into steps.

5.4.2.1 Collection of the evidence

Each player Pj submits point Sj,i = (xSj,i, ySj,i) to rate player Pi to the TA. The first

coordinate of point Sj,i is xSj,i = τ

r−1

j , where 0 ≤ τ r−1

j ≤ 1 is the aggregate score of

player Pj at round r − 1. The second coordinate of point Sj,i is ySj,i = ρj,i, where

0 ≤ ρj,i ≤ 1 is the rating submitted by player Pj to rate player Pi.

5.4.2.2 Representation of τ_i0

Since the ratings with respect to the performance of player Pi is represented as a value

between 0 and 1, the first component τ_i0 is also a value between 0 and 1. The idea is to define the data set S(i) = {S1,i, . . . , Si−1,i, Si+1,i, . . . , Sn,i} of points submitted

Table 5.1: Summary of the notation used to instantiate the accurate performance scoring mechanism through machine learning techniques.

Pi, Pj players

n total number of players

τ_ir aggregate score of player Pi at round r

τ_i0/τ_i00 first/second component of player Pi at round r

τ_i(r−1) aggregate score of player Pi at round r − 1

Sj,i data point describing how player Pj rates player Pi

xPj,i/yPj,i x/y-coordinate of data point Sj,i

ρj,i evidence submitted by player Pj with respect to player Pi

C1, . . . , CK clusters/classes of credibility

M1, . . . , MK center points of clusters C1, . . . , CK

yM1, . . . , yMK y-coordinate of M1, . . . , MK

wj,i weight of player Pj when rating player Pi

π1, . . . , πK mixing coefficients of M1, . . . , MK

α, β incentive, penalty for, respectively, accurate and inaccurate ratings

oj,i score gained or loss by player Pj when rating player Pi

a, b, c coefficients for τ_i0, τ_i00, and τ_ir−1 respectively

by player Pj6=i. Then, the machine learning techniques of K-means clustering (see

Section 5.4.1.1) and mixture of Gaussians (see Section 5.4.1.2) are used to extract the first component τ_i0 from coordinate ySj,i of each point in the data set S

(i)_.

5.4.2.3 Classes of credibility

K classes of evidence are distinguished with respect to their credibility by the K-

means clustering algorithm. Informally, we use the term credibility to underline the fact that these classes are distinguished based on the reputation already gained by the raters, whose ratings are believed to be more accurate, thus, credible. The points in the data set S(i) _{are grouped into K clusters C}

1, . . . , CK. In fact, each point in

the data set S(i) _{is a tuple corresponding to the values “aggregate score of the rater"}

5.4 Machine Learning-Based Instantiation of the Performance Scoring Mechanism

classes which take into account both values. The center points M1, . . . , MK of clusters

C1, . . . , CK simplify these classes of credibility with fewer, yet more informative points.

5.4.2.4 Assigning a weight wj,i to point Sj,i

Each point Sj,i is submitted by player Pj, which has aggregate score at round r − 1

τ_jr−1. We use this aggregate score to weight point Sj,i. We define weight wj,i as:

wj,i =

F (τ_jr−1)

Pn−1

j=1F (τjr−1)

where F : [0, 1] → R is a positive and increasing step function over subintervals of interval [0, 1], which assigns higher scores to larger aggregate scores τ_jr−1 at round

r − 1. The meaning of such definition for function F (x) is to simplify the weights

that a point Sj,i can have into fewer possible values. Thus, two aggregate scores

τ_jr−1 < τ_kr−1 at round r − 1 are considered equivalent with respect to the weights wj,i

and wk,i for the points Sj,i and Sk,i if they lie on the same subinterval.

5.4.2.5 Computation of the first component τ_i0

The first component τ_i0 is computed as a weighted combination of coordinates

yM1, . . . , yMK of the center points M1, . . . , MK, respectively. Center points M1, . . . , MK

are not equivalent: together with the classes of credibility they distinguish, they depend on the cardinality of the respective clusters. The idea is to associate val- ues π1, . . . , πK to center points M1, . . . , MK in quantitative and qualitative manner.

Values π1, . . . , πK are regarded as the mixing coefficients of a mixture p(S(i)) of

K Gaussian distributions N1(µ1, σ21), . . . , NK(µK, σK2 ). More precisely, the points

within each cluster Cl can be seen as following a Gaussian distribution with µl = Ml,

for l = 1, . . . , K. That is because the mean and the variance of a Gaussian distribu- tion convey information about where the points are mostly concentrated and how they are spread, which is comparable to the information conveyed by the clusters. Weights wj,i are used to compute the mixing coefficients π1, . . . , πK. In more de-

tail, πl =Pnj=1l wlj,i, where nl is the cardinality of cluster Cl and wlj,i is the weight

assigned to point Slj,i ∈ Cl, for l = 1, . . . , K. Note that

l=1πl = 1 and thus the

mixture p(S(i)_{) is a probability distribution. The weighted sum of means µ}

1, . . . , µK

represents the mean µ of the closets Gaussian distribution N (µ, σ2_{) approximating}

mixture p(S(i)_{) (see Section 5.4.1.2). And this is exactly what we aim at: since}

the means µ1, . . . , µK are the center points M1, . . . , MK, we can now compute the

first component τ_i0 as the weighted sum of coordinates yM1, . . . , yMK according to the

mixing coefficients π1, . . . , πK. That is,

τ_i0 =

K X

l=1

Basically, the first component is computed as coordinate yµ of the mean µ of the

fitting Gaussian distribution N (µ, σ2_{). One can argue that the first component τ}0

could be computed directly after clusters C1, . . . , CK were distinguished, without

passing through the step of computing the mixture of Gaussians. This is, in fact, what one would practically do when computing τ_i0. However, we highlight that this computation is possible because the center points of the clusters model are the means of Gaussian distributions.

We recall that the technique proposed to compute the first partial trust value τ_i(1) for storage server Si assumes an honest majority among the storage servers within

S. Otherwise, the cluster with the highest weight is the one with a high density of untrustworthy storage servers. Note that this is a common assumption within the framework of distributed storage.

In document Long-Term Confidential Secret Sharing-Based Distributed Storage Systems (Page 123-126)