Chapter 5 Linking the First and Second Tiers
6.3 Extended Duplication Model
As stated previously, the impact of externally imposed hierarchical rules on such models has been under-developed. Therefore, the model of graph evolution consid- ered here differs from a duplication-divergence regime in two main ways. Firstly, graphs are directed and hierarchically structured, such that there are restrictions
on node connections. This is defined by saying that a node of levellcan only have
outward edges to nodes of level strictly less thanl. As such, all graphs formed by this process are acyclic. Secondly, there are three different possibilities for mutation steps, which change the average outgoing degree of a node in the system. The model is described more completely in Chapter 4. This process models regulatory networks and other, more abstractly hierarchical biological networks, such as organs; and to consider the impact this has on the evolutionary process. Although the graphs we consider are theoretically biologically based, the findings could equally be applied to artificial hierarchical graphs.
Mutation Copy Vertex p New Vertex 1 - p Copy Subgraph q New Subgraph 1 - q
Figure 6.1: The mutation process has three possible outcomes – a vertex and its associated subgraph is copied, a vertex is copied but a new subgraph is created, or a new vertex is created with a random subgraph, at a level higher than the highest level of the vertices in its subgraph.
Figure 6.1 is a schema showing the pathways and associated probabilities associated with the mutation process. The first possible mutation step is to create a new node, on a new level, with a completely new subgraph. This happens in the
model with probabilityp. A subgraph is generated by having a fixed probability of
edge creation β between all other nodes in the species. This generates subgraphs
with particular properties – for example β = 0.5 corresponds to the creation of a
following way
hkn+1i=
nhkni+βn
n+ 1 . (6.2)
This assumes that a new node created in this way is expected to attach toβnother
nodes in the species. In fact, this is an oversimplification, due to the condition imposed where nodes must attach to at least one other node, and the way this
algorithm is implemented. The implementation chooses a node on level l−1 to
connect to, and then connects to the remaining nodes with a probability β. Thus
Equation 6.2 becomes
hkn+1i=
nhkni+ 1 +β(n−1)
n+ 1 , (6.3)
following the expectation of a binomial distribution. Obviously, the analytic equa- tion is considering the case where there is no selection, and sequential progression
of mutations is considered (i.e. a continuous mutation process where each mutant
replaces instantaneously the parent, and there are no selection pressures).
Duplication and divergence mutations are also implemented, but these are considered as separate stages rather than the combined process described in V´azquez et al. [2003]. In a mutation event, if a new node is not created, a node in the
species is copied instead. This happens with probability 1−p. Given this node
duplication, the model then either duplicates the subgraph or creates a new subgraph (divergence) using the same rules as described for new node creation. This happens
with probability q and 1−q respectively. Copying both the subgraph and node
changes the average outgoing degree of a node in the graph as hkn+1i=
nhkni+hkni
n+ 1 , (6.4)
which simplifies such that hkn+1i = hkni. Thus copying a node and its subgraph
doesn’t change the average outgoing degree when averaged over long time. This makes intuitive sense, as there is an equal chance of any node being chosen, and thus over long time all nodes and subgraphs will be duplicated, resulting in no over- all change in the average outgoing degree of a node. An extension considered later is to consider weight mutation probabilities, such that nodes of lower outgoing degree are more likely to mutate than nodes of high outgoing degree (mimicking the evolu- tionary process of module fixation, where some parts of genetic regulatory networks are easily modified but phylogenetically historical components less so [Erwin and Davidson, 2009]).
0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30 35 40 45 50
Time / number of mutations / number of nodes
<k n > p = 1 β = 0.5 (data) β = 0.5 β = 0.1 (data) β = 0.1 β = 0.9 (data) β = 0.9 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30
Time / number of mutations / number of nodes
<k n > p = 0.5, q = 1 β = 0.1 (data) β = 0.5 (data) β = 0.9 (data) β = 0.9 β = 0.5 β = 0.1
Figure 6.2: Figures showing the agreement between analytic and model data for varying values ofp and q. Solid lines indicate analytic results, dots indicate model data. Model data averaged over 100 runs.
node but creates a new random subgraph. This mutation step sets up a recurrence relationship, as to model it would require complete knowledge of the distribution of nodes within the graph at the time of mutation, which is dependent on all previous mutation steps. This distribution maps to a fraction of “viable” nodes – that is,
nodes which the copied node can connect to. We use λ as a proxy for this dis-
tribution, which models the proportion of nodes in the graph which are able to be connected to the mutated node (and thus have a level lower than that of the mutated node) at any one time. As such, the average outgoing degree changes as
hkn+1i=
nhkni+ 1 +β(nλ−1)
n+ 1 . (6.5)
Combining equations 6.2, 6.4, and 6.5 with the mutation probabilities results in
hkn+1i=
p(nhkni+ 1 +β(n−1)) + (1−p)((q(n+ 1)hkni) + (1−q)(nhki+ 1 +β(λn−1)))
n+ 1 .
(6.6)
The ratio betweenpandβdetermines the change in average outgoing degree, whilst
λ(which is dependent onpandn) determines the long time response of the system.