6.2 The Lower Weighted Algorithm
6.2.3 The Lower-Weighted Algorithm
Algorithm 7 Choose p := max j6=i {aij} K := {j 6= i : aij = p} q := min j∈K{aji} if q > p or p = 0 then return 0 else
Choose a state j ∈ K that has aji = q.
return j end if
We present the Lower-Weighted Algorithm, which attempts to construct almost invariant aggregates of a given reversible stochastic matrix. We first present a sub- algorithm, which will be of use in the main pseudocode, below. The inputs of the Choose Algorithm are a reversible stochastic matrix A on the state space S and a single state i ∈ S. The Choose Algorithm implicitly assumes that 0 /∈ S. If 0 ∈ S, we need to utilise some other symbol, not contained in S, in its place. The output of the Choose Algorithm is
1. a state j ∈ S, distinct from i, such that
aij = max j06=i{aij
0}
and πj ≥ πi for any stationary distribution π of A, or
2. 0, if no such j exists.
The inputs of the Lower-Weighted Algorithm are a stochastic matrix A on the state space S = {1, . . . , n} and a small nonnegative value < 1.
Algorithm 8 The lower-Weighted Algorithm B := A
Let G be the digraph on S that contains no arcs. m := 1S f := Reorder(B) n := |S| while n ≥ 2 do if for k = 1, . . . , n, bf (k)f (k) ≥ (1−)2 1+(mf (k)−2) or Choose(B, f (k)) = 0 then
Exit the while loop. else k := max 1≤k0≤n{k 0 : b f (k0)f (k0) < (1−) 2 1+(mf (k0)−2) and Choose(B, f (k 0)) 6= 0} L := {(l1, l2) : l1 6= l2, bf (l1)f (l2) = 0 and bf (l1)f (k)af (k)f (l2) 6= 0} j := Choose(B, f (k))
Add the directed arc f (k) → j to G. mj := mj + mf (k) B := B \ f (k) f := (f (1), . . . , f (k − 1), f (k + 1), . . . , f (n)) n := n − 1 if L is nonempty then lmin := min (l1,l2)∈L {l1} if lmin > k then lmin := lmin− 1 end if lmax := max (l1,l2)∈L {l1} if lmax> k then lmax:= lmax− 1 end if
g := Reorder(B(f (lmin), . . . , f (lmax)))
(f (lmin), . . . , f (lmax)) := (f (lmin− 1 + g(1)), . . . , f (lmin− 1 + g(lmax− lmin+ 1))
end if end if end while return G
Proposition 6.8. Let A be a reversible stochastic matrix and suppose that Algo- rithm 8 has been applied to A. Let B, f , l, G and m be the stored data after any number of iterations of the algorithm’s while loop. Let Π be a positive diagonal matrix such that ΠA is symmetric, let F = {f (1), . . . , f (l)} and let C = S \ F . Then,
1. B = A \ C;
2. B(f (1), . . . , f (l)) is a lower-weighted reordering of B;
3. G is acyclic, every member of C has out-degree 1 in G and every member of F has out-degree 0;
4. if the directed arc i → j is present in G, then πi ≤ πj; and
5. for each i ∈ F , the weakly connected component of G containing i contains exactly mi states.
Proof Statements 1, 3 and 5 are shown in the same manner as in Proposition 5.1 and Lemma 5.3.
Statement 4 is a consequence of the workings of the Choose Algorithm. Suppose that the directed arc i → j is present in G; then, there is a stochastic complement B = A \ C such that Choose(B, i) = j, which implies that π(j) ≥ π(i).
We now show statement 2. In Proposition 6.4, we have shown that the reorder al- gorithm produces a lower-weighted reordering of a reversible stochastic matrix; thus,
the matrix B and the permutation f := Reorder(B), at initialisation, satisfy state- ment 2. We show that if f corresponds to a lower-weighted reordering of B, one further iteration of the algorithm does not alter this fact.
Let B, f and l be the stored data after some number of iterations and suppose that B(f (1), . . . , f (n)) is a lower-weighted reordering of B. Suppose further that the algorithm executes at least one more iteration before terminating and let B0, f0 and n0 = n − 1 be the stored data after one more iteration. Let f (k) be the state selected for removal and let
L := {(l1, l2) : l1 6= l2, bf (l1)f (l2) = 0 and bf (l1)f (k)af (k)f (l2) 6= 0}.
Case one: L is empty. Then, we have
f0 = (f (1), . . . , f (k − 1), f (k + 1), . . . , f (n)).
So, we need to show that if 1 ≤ i < j ≤ n and i, j 6= k, then b0f (i)f (j) ≤ b0
f (j)f (i). We
note that
b0f (i)f (j) = bf (i)f (k)bf (k)f (j) 1 − bf (k)f (k)
and b0f (j)f (i)= bf (j)f (k)bf (k)f (i) 1 − bf (k)f (k)
.
Let 1 ≤ i < j ≤ n and i, j 6= k. Since L is empty, we have either bf (i)f (j) 6= 0 or
bf (i)f (k)bf (k)f (j) = 0.
Suppose that bf (i)f (j) 6= 0. Let Π be a positive diagonal matrix such that ΠA is
πf (i)bf (i)f (j)= πf (j)bf (j)f (i) and πf (i)b0f (i)f (j) = πf (j)b0f (j)f (i).
Since bf (i)f (j) ≤ bf (j)f (i) and bf (i)f (j) 6= 0, we must have πf (i) ≥ πf (j), which in turn
implies that b0f (i)f (j) ≤ b0 f (j)f (i).
Suppose that bf (i)f (k)bf (k)f (j) = 0. Then, either bf (i)f (k) = 0 or bf (k)f (j) = 0,
implying (as B is reversible) that either bf (k)f (i) = 0 or bf (j)f (k) = 0. Thus, b0f (i)f (j)=
bf (i)f (j) and b0f (j)f (i) = bf (j)f (i). So, since bf (i)f (j) ≤ bf (j)f (i), we have b0f (i)f (j)≤ b0f (j)f (i).
Case two: L is nonempty. Let
Lmin = min (l1,l2)∈L
{l1} and Lmax= max (l1,l2)∈L {l1}; let lmin = Lmin if Lmin < k Lmin− 1 otherwise, and lmax= Lmax if Lmax < k Lmax− 1 otherwise.
We note that (k, l0) /∈ L, for any index l0. If we suppose that (k, l0) ∈ L, then
bf (k)f (l0) = 0 and bf (k)f (k)bf (k)f (l0)6= 0,
which is a contradiction. We further note that there are no elements of the form (l0, l0) contained in L; so, lmin < lmax. Let g = Reorder(B0(lmin, . . . , lmax)); we note that g is
The permutation f0 is formed by first removing the kth element of f , forming
ˆ
f = (f (1), . . . , f (k − 1), f (k + 1), . . . , f (l)),
and then permuting the subsequence consisting of the lminth through lmaxth elements,
f0 = ( ˆf (1), . . . , ˆf (lmin− 1), f0(lmin), . . . , f0(lmax), ˆf (lmax+ 1), . . . , ˆf (n − 1)),
where
f0(lmin) = f (lˆ min− 1 + g(1)),
f0(lmin+ 1) = f (lˆ min− 1 + g(2)),
.. .
f0(lmax)) = f (lˆ min− 1 + g(lmax− lmin+ 1)).
Now, suppose that 1 ≤ i < j ≤ l − 1. We aim to show that bf0(i)f0(j)≤ bf0(j)f0(i).
First, assume that i < lmin. Then, f0(i) = ˆf (i) = f (i0) where i0 = i if i < k and
i0 = i + 1 if i ≥ k. If i ≥ k, then lmin ≥ k and so lmin = Lmin − 1, implying that
i0 < Lmin. If i < k, then i < lmin ≤ Lmin. In either case f0(i) = f (i0) where i0 < Lmin.
We further note that the construction of f0 implies that f0(j) = f (j0) where j0 > i0. Thus,
bf0(i)f0(j)= bf (i0)f (j0) ≤ bf (j0)f (i0) = bf0(j)f0(i).
Now, since (i0, j0) /∈ L, we have either
As in the proof of case 1, if the first possibility holds we have πf (i0) ≥ πf (j0), for any
positive diagonal Π which symmetrises A, and if the second possibility holds we have
bf (i0)f (j0) = b0f (i0)f (j0) and bf (j0)f (i0)= b0f (j0)f (i0).
Both possibilities imply that
b0f0(i)f0(j)= b0f (i0)f (j0) ≤ b0f (i0)f (j0) = b0f0(i)f (j).
The case j > lmax is very similar to that of i < lmin. This assumption implies, as
before, that f0(j) = f (j0) and f0(i) = f (i0) where i0 < j0 and (i0, j0) /∈ L. Thus, in this case we again have
b0f0(i)f0(j)= b0f (i0)f (j0) ≤ b0f (i0)f (j0) = b0f0(i)f (j).
So, we simply need to consider the case that lmin ≤ i < j ≤ lmax. The sequence
(g(1), . . . , g(lmax− lmin+ 1)) is obtained by the reorder algorithm with input
ˆ
B = B0( ˆf (lmin), . . . , ˆf (lmax)).
Thus, for i < j, ˆbg(i)g(j) ≤ ˆbg(j)g(i). Then, we note that the i0j0th entry of ˆB is the
ˆ
f (lmin− 1 + i0) ˆf (lmin− 1 + j0)th entry of B0. As well, if lmin ≤ i0 ≤ lmax,
f0(i0) = ˆf (lmin− 1 + g(i0 − lmin+ 1)).
b0f0(i)f0(j) = b0ˆ
f (lmin−1+g(i−lmin+1)) ˆf (lmin−1+g(j−lmin+1))
= b0f (lˆ
min−1+g(i0)) ˆf (lmin−1+g(j0))
= ˆbg(i0)g(j0)
≤ ˆbg(j0)g(j0)
= b0ˆ
f (lmin−1+g(j0)) ˆf (lmin−1+g(j0))
= b0ˆ
f (lmin−1+g(j−lmin+1)) ˆf (lmin−1+g(i−lmin+1))
= bf0(j)f0(i).
The procedure behind the lower-weighted algorithm is the following. Let A be a nearly uncoupled stochastic matrix. Suppose that the algorithm has proceeded through some number of iterations of its internal while loop; let B, G, m, f and l be the current stored data and let k be the index selected by the algorithm (supposing that the algorithm will proceed through at least one more iteration). We assume that B is error-reducing; as well, B(f (1), . . . , f (l)) is lower-weighted and k is the largest index such that
1. bf (k)f (k) < (1−)
2
1+(mf (k)−2), and
2. the state f (k) can be associated with a state j that has a higher relative fre- quency (in the associated Markov chain).
That is, the first condition leads us to suspect that k is the largest index such that f (k) is not the sole remaining member of an almost invariant aggregate. We insist upon the second condition, as well, because the property that if i G j then
π(i) ≤ π(j) is one of the base assumptions used to obtain the
(1 − )2
1 + (mf (k)− 2)
bound in Appendix B. Thus, we assume that the stochastic complement B \ f (k) is error-reducing as well.
Within the lower-weighted algorithm, it is not necessary to identify the collection L and then reorder the submatrix B(f (lmin), . . . , f (lmax)). One could simply re-calculate
f := reorder(B) at every iteration. However, we have found that, in practise, this makes the algorithm much less efficient.
Suppose that the matrix B is a lower-weighted reversible stochastic matrix on the ordered state space S and let ˆB = B \ i0 be a stochastic complement. Let
L = {(i, j) : i 6= j, bij = 0 and bii0bi0j 6= 0}.
As we saw in the above proposition, if (i, j) /∈ L and i < j, then ˆbij ≤ ˆbji. Thus,
only the submatrix that contains all of the ijth entries where (i, j) ∈ L needs to be reordered.
significantly fewer 0-entries (the collection L becomes smaller with successive com- plements). For example, suppose that the reversible stochastic matrix
B = ˜ B v wT b
has x nonzero off-diagonal entries. Let x1 be the number of nonzero off-diagonal
entries in the matrix ˜B and let x2 be the number of nonzero entries in the vector v.
Since B is reversible, the vectors v and w have identical zero-nonzero patterns; so, there are also x2 nonzero entries in w and we have x = x1 + 2x2. The number of
nonzero off-diagonal entries in the matrix 1 1 − bvw
T
is x22− x2 (there is one nonzero entry for each pair of distinct i and j with vi, wj 6= 0).
So, the number of nonzero off-diagonal entries in the stochastic complement
˜
B + 1
1 − bvw
T
is bounded above by x1 + x22 − x2 = x + x2(x2 − 3). The number of nonzero off-
diagonal entries of B can grow quite rapidly as we implement successive stochastic complements. Thus, the sizes of the submatrices that actually need to be reordered at each iteration can shrink equally rapidly.