• No results found

Semi-reciprocal

In document Metric Representations Of Networks (Page 72-76)

3.3 Intermediate clustering methods

3.3.3 Semi-reciprocal

In reciprocal clustering we require influence to propagate through bidirectional paths; see Fig. 3.1. We could reinterpret bidirectional propagation as allowing loops of node-length two in both directions. E.g., the bidirectional path between x and x1 in Fig. 3.1 can

be interpreted as a loop between x and x1 composed by two paths [x, x1] and [x1, x] of

node-length two. Semi-reciprocal clustering is a generalization of this concept where loops consisting of at most t nodes in each direction are allowed. Given t ∈N such that t ≥2, we use the notation Pxxt 0 to denote any path [x=x0, x1, . . . , xl=x0] joining x tox0 where

l≤t−1. That is,Pxxt 0 is a path starting atxand finishing at x0 with at most tnodes. We

reserve the notation Pxx0 to represent a path from x to x0 where no maximum is imposed

on the number of nodes. Given an arbitrary network N = (X, AX), define asASR(X t)(x, x0) the minimum cost incurred when traveling from node xto node x0 using a path of at most

tnodes. I.e., ASR(X t)(x, x0) := min Pt xx0 max i|xi∈Pt xx0 AX(xi, xi+1). (3.51)

We define the family of semi-reciprocal clustering methodsHSR(t)with output (X, uSR(t)

X ) =

HSR(t)(X, AX) as the one for which the ultrametricuSR(t)

X (x, x 0) betweenx andx0 is uSR(X t)(x, x0) := min Pxx0 max i|xi∈Pxx0 ¯ ASR(X t)(xi, xi+1), (3.52)

where the function ¯ASR(X t) is defined as ¯

ASR(X t)(xi, xi+1) := max ASR(X t)(xi, xi+1), ASR(X t)(xi+1, xi)

The path Pxx0 of unconstrained length in (3.52) is called the main path, represented by

[x=x0, x1, ..., xl−1, x0] in Fig. 3.6. Between consecutive nodesxiandxi+1of the main path,

we build loops consisting of secondary paths in each direction, represented in Fig. 3.6 by [xi, yi1, ..., yiki, xi+1] and [xi+1, yi01, ..., y

0

ik0i, xi] for alli. For the computation of u

SR(t)

X (x, x

0),

the maximum allowed length of secondary paths is equal to t nodes, i.e.,ki, k0i ≤t−2 for all i. In particular, for t= 2 we recover the reciprocal path; see Fig. 3.1.

We can reinterpret (3.52) as the application of reciprocal clustering [cf. (3.2)] to a network with dissimilaritiesASR(X t) as in (3.51), i.e., a network with dissimilarities given by the optimal choice of secondary paths. Semi-reciprocal clustering methods are valid and satisfy axioms (A1)-(A2) as shown in the following proposition.

Proposition 6 The semi-reciprocal clustering methodHSR(t) is valid and admissible for all

integers t≥2. I.e., uSR(X t) is a valid ultrametric and HSR(t) satisfies axioms (A1)-(A2).

Proof: We begin the proof by showing that (3.52) outputs a valid ultrametric where the only non-trivial property to be shown is the strong triangle inequality (2.12). For a fixed t, pick an arbitrary pair of nodesxandx0and an arbitrary intermediate nodex00. Let us denote by Pxx∗ 00 and Px∗00x0 a pair of main paths that satisfy definition (3.52) for u

SR(t)

X (x, x

00) and

uSR(X t)(x00, x0) respectively. ConstructPxx0 by concatenating the aforementioned minimizing

paths Pxx∗ 00 and Px∗00x0. However, Pxx0 is a particular path for computing uSR(t)

X (x, x0) and

need not be the minimizing one. This implies that

uSR(X t)(x, x0)≤maxuSR(X t)(x, x00), uSR(X t)(x00, x0), (3.54)

proving the strong triangle inequality.

To show fulfillment of (A1), consider the network ({p, q}, Ap,q) with Ap,q(p, q) =α and

Ap,q(q, p) = β. Note that in this situation, ASR(p,qt)(p, q) = α and ASR(p,qt)(q, p) = β for all t≥2 [cf. (3.51)], since there is only one possible path between them and contains only two nodes. Hence, from (3.52),

uSR(p,qt)(p, q) = max(α, β), (3.55)

for all t. Consequently, axiom (A1) is satisfied.

To show fulfillment of (A2), consider two arbitrary networks (X, AX) and (Y, AY) and

a dissimilarity reducing map φ :X → Y between them. Further, denote by PxxX∗0 = [x =

x0, . . . , xl = x0] a main path that achieves the minimum semi-reciprocal cost in (3.52).

Then, for a fixed t, we can write

uSR(X t)(x, x0) = max

i|xi∈PX∗

xx0

¯

Consider now a secondary path Pxixit +1 = [xi = x(0), . . . , x(l

0)

= xi+1] between two con-

secutive nodes xi and xi+1 of the minimizing path PxxX∗0. Further, focus on the image

of this secondary path under the map φ, that is Pt

φ(xi)φ(xi+1) := φ P t xi,xi+1 = [φ(xi) = φ(x(0)), . . . , φ(x(l0)) =φ(xi+1)] in the setY.

Since the mapφ:X→Y is dissimilarity reducing,AY(φ(x(i)), φ(x(i+1)))≤AX(x(i), x(i+1)) for all links in this path. Analogously, we can bound the dissimilarities in secondary paths

Pxit+1,xi from xi+1 back toxi. Thus, from (3.51) we can state that,

¯

AXSR(t)(xi, xi+1)≥A¯YSR(t)(φ(xi), φ(xi+1)). (3.57)

Denote byPφY(x)φ(x0)the image of the main pathPxxX∗0 under the mapφ. Notice thatPφY(x)φ(x0)

is a particular path joiningφ(x) andφ(x0), whereas the semi-reciprocal ultrametric computes the minimum across all main paths. Therefore,

uSR(Y t)(φ(x), φ(x0))≤ max

i|φ(xi)∈PY

φ(x)φ(x0)

¯

ASR(Y t)(φ(xi), φ(xi+1)). (3.58)

By bounding the right-hand side of (3.58) using (3.57) and recalling (3.56), it follows that

uSR(Y t)(φ(x), φ(x0))≤uSR(X t)(x, x0). This proves that (A2) is satisfied.

The semi-reciprocal family is a countable family of clustering methods parameterized by integert≥2 representing the allowed maximum node-length of secondary paths. Reciprocal and nonreciprocal ultrametrics are equivalent to semi-reciprocal ultrametrics for specific values of t. For t= 2 we have uSR(2)X =uRX meaning that we recover reciprocal clustering. To see this formally, note thatASR(2)X (x, x0) =AX(x, x0) [cf. (3.51)] since the only path of length two joiningx and x0 is [x, x0]. Hence, fort= 2, (3.52) reduces to

uSR(2)X (x, x0) = min Pxx0 max i|xi∈Pxx0 ¯ AX(xi, xi+1), (3.59)

which is the definition of the reciprocal ultrametric [cf. (3.2)]. Nonreciprocal ultrametrics can be obtained asuSR(X t)=uNRX for any parameter texceeding the number of nodes in the network analyzed. To see this, notice that minimizing overPxx0 is equivalent to minimizing

over Pxxt 0 for all t ≥n, since we are looking for minimizing paths in a network with non-

negative dissimilarities. Therefore, visiting the same node twice is not an optimal choice. This implies that Pxxn0 contains all possible minimizing paths between x and x0. Hence,

by inspecting (3.51), ASR(X t)(x, x0) = ˜u∗X(x, x0) [cf. (2.7)] for all t≥n. Furthermore, when

t≥n, the best main path that can be picked is formed only by nodesx andx0 because, in this way, no additional meeting point is enforced between the paths going fromxtox0 and

x x1 x2 x3 x4 x0 x5 x6 1 1 1 2 1 1 1 1 3 2 2 4 4 4 4 2

Figure 3.7: Semi-reciprocal example. Computation of semi-reciprocal ultrametrics between nodesx

andx0 for different values of parametert; see text for details.

vice versa. As a consequence, definition (3.52) reduces to

uSR(X t)(x, x0) = maxu˜∗X(x, x0),u˜∗X(x0, x), (3.60)

for all x, x0 ∈ X and for all t ≥ n. The right hand side of (3.60) is the definition of the nonreciprocal ultrametric [cf. (3.8)].

For the network in Fig. 3.7, we compute the semi-reciprocal ultrametrics betweenxand

x0 for different values of t. The edges which are not delineated are assigned dissimilarity values greater than 4. Since the only bidirectional path between x and x0 uses x3 as

the intermediate node, we conclude that uRX(x, x0) = uSR(2)X (x, x0) = 4. Furthermore, by constructing a path through the outermost clockwise cycle in the network, we conclude that uNR

X (x, x0) = 1. Since the longest secondary path in the minimizing path for the

nonreciprocal case, [x, x1, x2, x4, x0], has node-length 5, we may conclude thatuSR(X t)(x, x0) =

1 for all t ≥ 5. For intermediate values of t, if e.g., we fix t = 3, the minimizing path is given by the main path [x, x3, x0] and the secondary paths [x, x1, x3], [x3, x4, x0], [x0, x5, x3]

and [x3, x6, x] joining consecutive nodes in the main path in both directions. The maximum

cost among all dissimilarities in this path is AX(x1, x3) = 3. Hence,uSR(3)X (x, x0) = 3. The

minimizing path for t = 4 is similar to the minimizing one for t = 3 but replacing the secondary path [x, x1, x3] by [x, x1, x2, x3]. In this way, we obtainu

SR(4)

X (x, x0) = 2.

Remark 3 Intuitively, when propagating influence through a network, reciprocal cluster- ing requires bidirectional influence whereas nonreciprocal clustering allows arbitrarily large unidirectional cycles. In many applications, such as trust propagation in social networks, it is reasonable to look for an intermediate situation where influence can propagate through cycles but of limited length. Semi-reciprocal ultrametrics represent this intermediate situa- tion where the parametertrepresents the maximum length of paths through which influence can propagate in a nonreciprocal manner.

In document Metric Representations Of Networks (Page 72-76)