The Lower-Weighted Algorithm - The Lower Weighted Algorithm

6.2 The Lower Weighted Algorithm

6.2.3 The Lower-Weighted Algorithm

Algorithm 7 Choose p := max j6=i {aij} K := {j 6= i : aij = p} q := min j∈K{aji} if q > p or p = 0 then return 0 else

Choose a state j ∈ K that has aji = q.

return j end if

We present the Lower-Weighted Algorithm, which attempts to construct almost invariant aggregates of a given reversible stochastic matrix. We first present a sub- algorithm, which will be of use in the main pseudocode, below. The inputs of the Choose Algorithm are a reversible stochastic matrix A on the state space S and a single state i ∈ S. The Choose Algorithm implicitly assumes that 0 /∈ S. If 0 ∈ S, we need to utilise some other symbol, not contained in S, in its place. The output of the Choose Algorithm is

1. a state j ∈ S, distinct from i, such that

aij = max j0_6=i{aij

and πj ≥ πi for any stationary distribution π of A, or

2. 0, if no such j exists.

The inputs of the Lower-Weighted Algorithm are a stochastic matrix A on the state space S = {1, . . . , n} and a small nonnegative value < 1.

Algorithm 8 The lower-Weighted Algorithm B := A

Let G be the digraph on S that contains no arcs. m := 1S f := Reorder(B) n := |S| while n ≥ 2 do if for k = 1, . . . , n, bf (k)f (k) ≥ (1−)2 1+(mf (k)−2) or Choose(B, f (k)) = 0 then

Exit the while loop. else k := max 1≤k0_≤n{k 0 _{: b} f (k0_{)f (k}0₎ < (1−) 2 1+(m_{f (k0)}−2) and Choose(B, f (k 0_{)) 6= 0}} L := {(l1, l2) : l1 6= l2, bf (l1)f (l2) = 0 and bf (l1)f (k)af (k)f (l2) 6= 0} j := Choose(B, f (k))

Add the directed arc f (k) → j to G. mj := mj + mf (k) B := B \ f (k) f := (f (1), . . . , f (k − 1), f (k + 1), . . . , f (n)) n := n − 1 if L is nonempty then lmin := min (l1,l2)∈L {l1} if lmin > k then lmin := lmin− 1 end if lmax := max (l1,l2)∈L {l1} if lmax> k then lmax:= lmax− 1 end if

g := Reorder(B(f (lmin), . . . , f (lmax)))

(f (lmin), . . . , f (lmax)) := (f (lmin− 1 + g(1)), . . . , f (lmin− 1 + g(lmax− lmin+ 1))

end if end if end while return G

Proposition 6.8. Let A be a reversible stochastic matrix and suppose that Algo- rithm 8 has been applied to A. Let B, f , l, G and m be the stored data after any number of iterations of the algorithm’s while loop. Let Π be a positive diagonal matrix such that ΠA is symmetric, let F = {f (1), . . . , f (l)} and let C = S \ F . Then,

1. B = A \ C;

2. B(f (1), . . . , f (l)) is a lower-weighted reordering of B;

3. G is acyclic, every member of C has out-degree 1 in G and every member of F has out-degree 0;

4. if the directed arc i → j is present in G, then πi ≤ πj; and

5. for each i ∈ F , the weakly connected component of G containing i contains exactly mi states.

Proof Statements 1, 3 and 5 are shown in the same manner as in Proposition 5.1 and Lemma 5.3.

Statement 4 is a consequence of the workings of the Choose Algorithm. Suppose that the directed arc i → j is present in G; then, there is a stochastic complement B = A \ C such that Choose(B, i) = j, which implies that π(j) ≥ π(i).

We now show statement 2. In Proposition 6.4, we have shown that the reorder algorithm produces a lower-weighted reordering of a reversible stochastic matrix; thus,

the matrix B and the permutation f := Reorder(B), at initialisation, satisfy statement 2. We show that if f corresponds to a lower-weighted reordering of B, one further iteration of the algorithm does not alter this fact.

Let B, f and l be the stored data after some number of iterations and suppose that B(f (1), . . . , f (n)) is a lower-weighted reordering of B. Suppose further that the algorithm executes at least one more iteration before terminating and let B0, f0 and n0 = n − 1 be the stored data after one more iteration. Let f (k) be the state selected for removal and let

L := {(l1, l2) : l1 6= l2, bf (l1)f (l2) = 0 and bf (l1)f (k)af (k)f (l2) 6= 0}.

Case one: L is empty. Then, we have

f0 = (f (1), . . . , f (k − 1), f (k + 1), . . . , f (n)).

So, we need to show that if 1 ≤ i < j ≤ n and i, j 6= k, then b0_{f (i)f (j)} ≤ b0

f (j)f (i). We

note that

b0_{f (i)f (j)} = bf (i)f (k)bf (k)f (j) 1 − bf (k)f (k)

and b0_{f (j)f (i)}= bf (j)f (k)bf (k)f (i) 1 − bf (k)f (k)

Let 1 ≤ i < j ≤ n and i, j 6= k. Since L is empty, we have either bf (i)f (j) 6= 0 or

bf (i)f (k)bf (k)f (j) = 0.

Suppose that bf (i)f (j) 6= 0. Let Π be a positive diagonal matrix such that ΠA is

πf (i)bf (i)f (j)= πf (j)bf (j)f (i) and πf (i)b0f (i)f (j) = πf (j)b0f (j)f (i).

Since bf (i)f (j) ≤ bf (j)f (i) and bf (i)f (j) 6= 0, we must have πf (i) ≥ πf (j), which in turn

implies that b0_{f (i)f (j)} ≤ b0 f (j)f (i).

Suppose that bf (i)f (k)bf (k)f (j) = 0. Then, either bf (i)f (k) = 0 or bf (k)f (j) = 0,

implying (as B is reversible) that either bf (k)f (i) = 0 or bf (j)f (k) = 0. Thus, b0f (i)f (j)=

bf (i)f (j) and b0_{f (j)f (i)} = bf (j)f (i). So, since bf (i)f (j) ≤ bf (j)f (i), we have b0_{f (i)f (j)}≤ b0_{f (j)f (i)}.

Case two: L is nonempty. Let

Lmin = min (l1,l2)∈L

{l1} and Lmax= max (l1,l2)∈L {l1}; let lmin =        Lmin if Lmin < k Lmin− 1 otherwise, and lmax=        Lmax if Lmax < k Lmax− 1 otherwise.

We note that (k, l0) /∈ L, for any index l0_{. If we suppose that (k, l}0_{) ∈ L, then}

bf (k)f (l0₎ = 0 and b_{f (k)f (k)}b_{f (k)f (l}0₎6= 0,

which is a contradiction. We further note that there are no elements of the form (l0, l0) contained in L; so, lmin < lmax. Let g = Reorder(B0(lmin, . . . , lmax)); we note that g is

The permutation f0 is formed by first removing the kth element of f , forming

f = (f (1), . . . , f (k − 1), f (k + 1), . . . , f (l)),

and then permuting the subsequence consisting of the lminth through lmaxth elements,

f0 = ( ˆf (1), . . . , ˆf (lmin− 1), f0(lmin), . . . , f0(lmax), ˆf (lmax+ 1), . . . , ˆf (n − 1)),

where

f0(lmin) = f (lˆ min− 1 + g(1)),

f0(lmin+ 1) = f (lˆ min− 1 + g(2)),

.. .

f0(lmax)) = f (lˆ min− 1 + g(lmax− lmin+ 1)).

Now, suppose that 1 ≤ i < j ≤ l − 1. We aim to show that bf0_(i)f0_(j)≤ b_f0_(j)f0_(i).

First, assume that i < lmin. Then, f0(i) = ˆf (i) = f (i0) where i0 = i if i < k and

i0 = i + 1 if i ≥ k. If i ≥ k, then lmin ≥ k and so lmin = Lmin − 1, implying that

i0 < Lmin. If i < k, then i < lmin ≤ Lmin. In either case f0(i) = f (i0) where i0 < Lmin.

We further note that the construction of f0 implies that f0(j) = f (j0) where j0 > i0. Thus,

bf0_(i)f0_(j)= b_{f (i}0_{)f (j}0₎ ≤ b_{f (j}0_{)f (i}0₎ = b_f0_(j)f0_(i).

Now, since (i0, j0) /∈ L, we have either

As in the proof of case 1, if the first possibility holds we have πf (i0₎ ≥ π_{f (j}0₎, for any

positive diagonal Π which symmetrises A, and if the second possibility holds we have

bf (i0_{)f (j}0₎ = b0_{f (i}0_{)f (j}0₎ and b_{f (j}0_{)f (i}0₎= b0_{f (j}0_{)f (i}0₎.

Both possibilities imply that

b0_f0_(i)f0_(j)= b0_{f (i}0_{)f (j}0₎ ≤ b0_{f (i}0_{)f (j}0₎ = b0_f0_{(i)f (j)}.

The case j > lmax is very similar to that of i < lmin. This assumption implies, as

before, that f0(j) = f (j0) and f0(i) = f (i0) where i0 < j0 and (i0, j0) /∈ L. Thus, in this case we again have

b0_f0_(i)f0_(j)= b0_{f (i}0_{)f (j}0₎ ≤ b0_{f (i}0_{)f (j}0₎ = b0_f0_{(i)f (j)}.

So, we simply need to consider the case that lmin ≤ i < j ≤ lmax. The sequence

(g(1), . . . , g(lmax− lmin+ 1)) is obtained by the reorder algorithm with input

B = B0( ˆf (lmin), . . . , ˆf (lmax)).

Thus, for i < j, ˆbg(i)g(j) ≤ ˆbg(j)g(i). Then, we note that the i0j0th entry of ˆB is the

f (lmin− 1 + i0) ˆf (lmin− 1 + j0)th entry of B0. As well, if lmin ≤ i0 ≤ lmax,

f0(i0) = ˆf (lmin− 1 + g(i0 − lmin+ 1)).

b0_f0_(i)f0_(j) = b0_ˆ

f (lmin−1+g(i−lmin+1)) ˆf (lmin−1+g(j−lmin+1))

= b0_{f (l}_ˆ

min−1+g(i0)) ˆf (lmin−1+g(j0))

= ˆbg(i0_)g(j0₎

≤ ˆbg(j0_)g(j0₎

= b0_ˆ

f (lmin−1+g(j0)) ˆf (lmin−1+g(j0))

= b0_ˆ

f (lmin−1+g(j−lmin+1)) ˆf (lmin−1+g(i−lmin+1))

= bf0_(j)f0_(i).

The procedure behind the lower-weighted algorithm is the following. Let A be a nearly uncoupled stochastic matrix. Suppose that the algorithm has proceeded through some number of iterations of its internal while loop; let B, G, m, f and l be the current stored data and let k be the index selected by the algorithm (supposing that the algorithm will proceed through at least one more iteration). We assume that B is error-reducing; as well, B(f (1), . . . , f (l)) is lower-weighted and k is the largest index such that

1. bf (k)f (k) < (1−)

1+(mf (k)−2), and

2. the state f (k) can be associated with a state j that has a higher relative fre- quency (in the associated Markov chain).

That is, the first condition leads us to suspect that k is the largest index such that f (k) is not the sole remaining member of an almost invariant aggregate. We insist upon the second condition, as well, because the property that if i G j then

π(i) ≤ π(j) is one of the base assumptions used to obtain the

(1 − )2

1 + (mf (k)− 2)

bound in Appendix B. Thus, we assume that the stochastic complement B \ f (k) is error-reducing as well.

Within the lower-weighted algorithm, it is not necessary to identify the collection L and then reorder the submatrix B(f (lmin), . . . , f (lmax)). One could simply re-calculate

f := reorder(B) at every iteration. However, we have found that, in practise, this makes the algorithm much less efficient.

Suppose that the matrix B is a lower-weighted reversible stochastic matrix on the ordered state space S and let ˆB = B \ i0 be a stochastic complement. Let

L = {(i, j) : i 6= j, bij = 0 and bii0b_i0_j 6= 0}.

As we saw in the above proposition, if (i, j) /∈ L and i < j, then ˆbij ≤ ˆbji. Thus,

only the submatrix that contains all of the ijth entries where (i, j) ∈ L needs to be reordered.

significantly fewer 0-entries (the collection L becomes smaller with successive complements). For example, suppose that the reversible stochastic matrix

B =     ˜ B v wT _b    

has x nonzero off-diagonal entries. Let x1 be the number of nonzero off-diagonal

entries in the matrix ˜B and let x2 be the number of nonzero entries in the vector v.

Since B is reversible, the vectors v and w have identical zero-nonzero patterns; so, there are also x2 nonzero entries in w and we have x = x1 + 2x2. The number of

nonzero off-diagonal entries in the matrix 1 1 − bvw

is x2₂− x2 (there is one nonzero entry for each pair of distinct i and j with vi, wj 6= 0).

So, the number of nonzero off-diagonal entries in the stochastic complement

B + 1

1 − bvw

is bounded above by x1 + x22 − x2 = x + x2(x2 − 3). The number of nonzero off-

diagonal entries of B can grow quite rapidly as we implement successive stochastic complements. Thus, the sizes of the submatrices that actually need to be reordered at each iteration can shrink equally rapidly.

In document A Combinatorial Approach to Nearly Uncoupled Markov Chains (Page 182-193)