2.4 Random Digraphs Characterised By Generations
2.4.1 Rank Chain/Path Notation And Denition
In Section 2.3.3 we counted the number of C-connected digraphs by using a recursive method, which involved reducing the desired digraph into sub-digraphs with fewer vertices and edges. We will consider stepping along the rank chain in a similar recursive
2.4 Random Digraphs Characterised By Generations 76 manner.
The term rank is used inLudwig(1975) to couple the nal size of the epidemic process to a Markov chain, the term generation will also be used as an equivalent name. However, they are not the same in an epidemiological sense, seePellis et al.(2008) for a discussion of the two denitions. It is possible for two individuals to be of the same rank, but dierent generations, if the timing of the infectious contacts is considered. Since the actual times of infectious contacts are the missing data we are attempting to avoid the need to impute, rank and generation shall be equivalent in our case.
The edge information is no longer important when considering the rank chain, there is no explicit information about the edges being recorded. We consider instead knowing only the rank of each connected vertex. It is possible to construct another recursive method to calculate the probability of a digraph being C-connected, omitting edges there are fewer variables to account for and hence fewer recursive steps. The results characterising the number of edges cannot be directly related to the rank methods, as the latter does not store the necessary information about the edges required to reconstruct the exact digraph. The rank method cannot produce output as in Table
2.2, but it can compute the rank chain probabilities.
Recall, a single rank chain corresponds to several bases in Theorem2.10. Covering all the possibilities from the minimal basis to the maximal basis, in terms of the number of forward edges assigned at each rank. Under the rank chain method we no longer track the edges, thus we have reduced to considering the rank chain as all bases at once. Briey we restate the denitions given in Section2.2.1. Let r and s denote the initial number of root and non-root vertices, with a total of n = r + s vertices. Let Pr,s[E]
be the probability of event E given r roots and s non-roots. Denote the rank chain as the vector Z = (Z1, Z2, . . . ) where Zt = (Xt, Yt). The number of vertices of rank t is
2.4 Random Digraphs Characterised By Generations 77 Xt and the total number connected including rank t is Yt, i.e. Yt = Ptk=0Xt. Since
the cumulative totals, Yt are a function of the size of each rank, we shall often write
the rank chain as Z = (X0, X1, X2, . . . ) for clarity. The number of vertices is nite
and for the chain to continue there must be at least one vertex in each rank, it is sucient to consider only ranks 0 ≤ t ≤ n − r + 1. As Xn−r+1 = 0, then Xt = 0 for
all n − r + 1 < t < ∞. We shall use the term rank t to denote a vector Zt or Xt, and
(rank) chain to denote Z.
We will condition on the digraph being D-connected (to emphasise the dierence be- tween the edge and rank methods we shall use D instead of C). So D = d corresponds to the nal component of Z being Yn−r+1 = r + d for 0 ≤ r ≤ n and 0 ≤ d ≤ n − r.
Let τ denote the length of each chain such that τ = min{t : Xt+1 = 0}, i.e. τ is the
last rank of non-zero size.
For example, the two diagrams in Figure2.5(p63) show two possible rank chains that connect four vertices, d = 4 from among seven non-root vertices, s = 7 with a single root vertex, r = 1. By Lemma2.8there are 2d−1= 23 = 8 rank chains. Figures2.5(a)
and 2.5(b) show the rank chains Z = (1, 2, 2, 0) and Z = (1, 1, 3, 0) respectively, both having τ = 2. The remaining six rank chains can be deduced from Figure2.6.
Relating the above to an Susceptible-Infective-Removed (SIR) epidemic process as de- ned in Section 1.2, the initial number of susceptibles and infectives are S0 = s and
I0 = r respectively in a xed population of size n = St+ It+ Rt (where t denotes
continuous time). The nal size of an epidemic is the number of initial susceptibles that ultimately become infected, corresponding to the connectedness of the digraph, i.e. S0− S∞= d.
The space of all possible chains Z is a subset of Zn−r+1
+ . To dierentiate an epidemic
2.4 Random Digraphs Characterised By Generations 78 and paths are interchangeable due to the equivalence stated in Section 2.2.2. The path Z for the epidemic gives the sequence of infected individuals, terminating when there are no individuals subsequently infected. The following method will calculate the probability of a given path.
Each path consists of elements Zt= (xt, yt), the number of individuals infected in rank
tand the total number infected so far, in the following sections we shall calculate the probability of moving from one such state to another, which we shall term the step probability.
For a given nal size d, it is possible to calculate the number of possible paths. By denition Z0 = (a, a)for all paths, and the d individuals are assigned to the generations
such that there are yτ = a + d. Then by Lemma 2.8, there are 2d−1 possible paths,
which is the sum of the number of paths for each length, 1, . . . , d.
Note that we use τ as the length of the path, not the stopping time of the epidemic as is typical, since we are not interested in temporal data. Though not identical, there is a relation between the stopping time of an epidemic and the length of the corresponding rank chain, the latter can be used to give an approximate scale of the former.