Ordered Critical Chaotic
9.6 Density classification problem
In this section we take on the problem of training BNs to solve a classification problem. We know that cell behaviour can change in response to variation in the concentration of nutrients or other chemical substances: the cell adapts its own internal dynamics in reaction to different condition in the environment. From an abstract point of view, we can say that a cell is able to solve a classification problem, where the environmental conditions represent the example to classify and the resulting cell dynamical behaviour is the response. Since BNs are used to describe GRN behaviour, it is natural to ask whether BNs are able to learn how to perform an analogous function. In this section we focus on a particular classification problem (the Density Classification Problem, described below) and we ask whether it is possible to design a BN that, when subject to different initial conditions (corresponding to external stimuli in the case of real cells), it responds with a specific dynamical behaviour, i.e., attractor.
The Density Classification Problem (DCP), also known as Density Classi-fication Task, first introduced by Packard, is a simple counting problem [171]
born within the area of cellular automata (CA), as paradigmatic example of a problem hardly solvable for decentralised systems. Informally, it requires that a binary CA (or more generally a discrete dynamical system—DDS) recognise whether an initial binary string contains more 0s or more 1s. In its original formulation, the nodes (or cells) are arranged in a one dimensional torus and can interact only with the neighbouring ones. The problem is that of designing simple rules, governing the dynamics of each node, in such a way that the sys-tem is driven to a uniform state consisting of all 1s, if the initial configuration contains more 1s, or all 0s otherwise. In other words, the convergence of the DDS should decide whether the initial density of 1s is greater or lower than 12. Although the assignment might look trivial, it is a challenging problem and it is known for having no exact solution in the case of deterministic one-dimensional CA [138]. The origin of this difficulty is very intriguing and comes
9.6.DENSITYCLASSIFICATIONPROBLEM143
Table 9.4: Summary of network features for N > 20.
N Measure Chaotic RBNs Critical RBNs Optimised networks
min µ median σ max min µ median σ max min µ median σ max
30
Period 1 18 6 36 335 1 3 2 3 20 1 1 1 1 9
N. of Attr. 1 4 4 3 15 1 3 2 4 37 2 6 6 3 22
Sensitivity 1.29 1.50 1.49 0.09 1.75 0.78 1.01 1.01 0.10 1.27 1.31 1.46 1.46 0.07 1.63 70
Period 1 1072 51 4426 79050 1 7 4 10 134 1 3 2 4 51
N. of Attr. 1 6 6 2 14 1 4 2 4 28 2 9 9 5 31
Sensitivity 1.38 1.50 1.50 0.05 1.61 0.87 1.01 1.01 0.06 1.14 1.06 1.33 1.34 0.11 1.57 200
Period 1 6.8E5 1.4E5 1.2E6 7.6E6 1 15 8 26 726 1 13 6 110 4445
N. of Attr. 2 4 4 1 9 1 23 3 96 878 2 19 12 21 130
Sensitivity 1.39 1.47 1.47 0.03 1.53 0.90 1.00 1.00 0.04 1.09 1.00 1.19 1.19 0.06 1.38
from the impossibility to centralise the information or to use counting tech-niques: the convergence to a global uniform state should be obtained by using only local decisions, i.e., by using just the information available in time within the close neighbours of a node. Given these difficulties, various modifications to the classical problem have been proposed, including stochastic CA, CA with memory, CA with different rules succeeding in time (see [78] and references cited therein). Interestingly, some authors directly investigated the dichotomy between the local nature of the CA and the global requirements of the related DCP by allowing the presence of long range connections within the links of the otherwise local neighbourhood [154, 197, 226, 234]. In particular, it can be shown that the simple majority rule applied on random topologies outperforms all human or artificially-evolved rules running on an ordered lattice [154, 197];
a performance gap that increases with the number of nodes [154]. The majority rule states that the value of a CA cell at time t + 1 is 0 (resp. 1) if the majority of its neighbours has value 0 (resp. 1) at time t.
These last two cited studies demonstrate that RBNs can effectively deal with the DCP. Our aim in this section is that of demonstrating that learning RBNs are flexible objects, able to attain a performance comparable to a hard-to-match benchmark such as the majority rule. Therefore, we will not use extremely large neighbourhoods or network sizes, but rather we will focus our attention to the learning process itself, leaving scaling issues to further work.
In order to define the learning processes, we divide the nodes of a BN into three (possibly overlapping) groups: input nodes, output nodes and hidden nodes13. Of course, this separation is not sufficient to completely specify the overall learning scheme since there are many details regarding topology and node dynamics to be addressed. For instance, input nodes could maintain their initial values (this is the typical case in neural networks) or could evolve in time according to the typical BN dynamics; output nodes could have or not have feedbacks on the hidden/input nodes; moreover (see [6] and below), it is not clear what is the influence on the final attractors of the initial conditions of hidden and output nodes. A possibility, explored in previous studies [6, 186], consists in partitioning network nodes into input, hidden and output nodes. In this setting, the value of input nodes is externally imposed and does not change during network evolution, whereas hidden and output nodes are driven, as usual, by their transition functions. Nevertheless, in [6] it is also shown that different initial settings of hidden and output nodes typically lead to different attractors, making the analysis of the network’s answer difficult. For the DCP we opt for an easier choice; we establish that: 1. all network nodes are input nodes;
2. all nodes are also output nodes (i.e., the state of each node contributes to the final answer); 3. there are no hidden nodes (it follows from the previous two conditions). This way there none of the nodes requires a special characterisation and the initial conditions are well defined. The correct answers can also be uniquely identified by two state vectors composed by all zeroes and all ones.
Finally, and coherently with the Boolean nature of BNs, in order to correctly interpret oscillating asymptotic states it is enough to compute the time averages for each node, assigning “0” to the averages lower than 0.5 and “1” otherwise.
In this paper we use two groups of RBNs having respectively 11 and 21 nodes
13Although this categorisation is reminiscent of the distinction between input, hidden and output layers in neural networks, we have to remember that the topology of a Boolean network could be in principle any graph without any clear separation of node into “layers”.
9.6. DENSITY CLASSIFICATION PROBLEM 145