Graph-Class of Bayesian Networks - Classes of Graphs and Their Number

B.2 Classes of Graphs and Their Number

B.2.3 Graph-Class of Bayesian Networks

Non-dynamic or static BNs have been introduced as a simple illustration of probabilistic graphical models (Section 1.2.1). BNs do further get used by the BD scores: When learning DBNs with these scores they are implicitly converted to BNs (Section 2.2.6). Here their associated graph-class is discussed not only for completeness reasons, but also because BNs give a good example of how a relatively simple restrictions on the structure of a graph can render counting its elements highly complicated.

structure will always be acyclic;2 _i.e. _{it is a directed acyclic graph (DAG)} (Section B.1). In order to determine whether a given graph is acyclic or not it is not sufficient to determine properties of each node’s parents separately. Instead, the full network, i.e. its entire adjacency matrix, must be analysed at once to ensure acyclicity. This is because a cycle can be up ton−1 links long, when each node is visited once. Inspecting parent configurations can only reveal loops, but no cycles involving 2 or more links. In order to identify the latter all possible paths in the graph must be analysed for which the full adjacency matrix needs to be considered. As a consequence of this, the class of DAGs contrasts those of the SSS and DBNs for which the number of each nodes’ parent configurations was sufficient to derive the number of graphs; DAGs cannot be counted in a simple way. However, in 1973, Robinson [1973] and Stanley [2006] independently derived the corresponding formula: The number of DAGsanwith

nvertices is given by the recursion

a0= 1, an= n X k=1 (−1)k−1 _n k 2k(n−k)an−k for n∈N. (B.8) This formula does not convey an intuitive understanding of the actual number of DAGs. However, theoretical investigations about its asymptotic behaviour are easier to comprehend [Bender et al., 1986, Bender and Robinson, 1988]:

an∼n!

2(n2)

M pn for p≈1.488. . . , M ≈0.474. . . . (B.9)

The value of the fraction increases rapidly, as can be seen by re-writing the numerator to 22−1(n2−n)_{: This super-exponential growth exceeds the exponen-} tial growth of the denominator. In total the number of DAGs is thus quickly growing with respect to the number of nodes. The complications involved in counting DAGs give an idea of the complexity to number graphs that arise from constraints additional to acyclicity, e.g. a maximum on the number of links or parents per node. Corresponding results can probably be found in the mathematical literature, but no attempt is made here in order to summarise them.

Additional to the size of the DAG-class, it would be interesting to understand the structure of its elements. As already mentioned, a simple and specific characterisation is not possible; however, the adjacency matrix of any DAG has interesting features, which at least facilitate a glimpse on the structure of this class. This is outlined in the following sections in which two approaches

2_{This follows directly from the factorisation via the chain rule in equation (1.10) (page 11)}

to identify an acyclic graph are presented. The considerations have practical importance because — as motivated in the introduction to section B.2 — networks whose structure is inconsistent with the model should not be scored. For BNs this means that graphs need to be acyclic, which can be ensured with the following two concepts.

Acyclicity-Check Using Node-re-ordering

In general it is not possible to determine whether a graph is cyclic or not by checking whether its adjacency matrix has a particular form. In contrary, it is possible to tell that certain adjacency matrices correspond to acyclic graphs. These matrices are strict triangular, i.e. all non-zero elements can be found on one side of the diagonal. For example, an strict upper triangular matrix has the following form A=      0 ∗ ∗ 0 . .. ∗ 0 0 0      (B.10)

where each∗ indicates an arbitrary value. Any adjacency matrix of this form can be verified to describe an acyclic graph by applying its definition. But DAGs can have non-triangular adjacency matrices as well; however, re-ordering nodes can always yield a triangular matrix. In contrary, for cyclic graphs no arrangement of nodes exists, which yields a triangular matrix.

It is questionable whether this concept of node re-ordering can yield an efficient implementation of an acyclicity test; however, note that the form of the adjacency matrix (B.10) implies that there must be at least one node without any links starting from it: A’s last row has zero entries only. Likewise,A’s first column implies that there must be at least one node without any parents. These are two necessary conditions for acyclicity, which can serve as a computationally cheap preliminary cyclicity check.

Another, more practicable approach to test the acyclicity of a graph is presented next. Instead of re-ordering nodes this approach uses matrix multiplication and can be easily be implemented. The outlined procedure is known as the

Floyd-Warshall algorithm([Floyd, 1962, Warshall, 1962] or [Cormen et al., 2001,

pp.629]).

Acyclicity-Check Using Matrix-Multiplication

This section discusses the theoretical foundation of the Floyd-Warshall algorithm [Warshall, 1962]. Its implementation according to a dynamic program-

ming approach can be found in the literature (e.g. [Floyd, 1962] or [Cormen et al., 2001, pp.629]). Warshall’s essential idea lies in the understanding of the adjacency matrix: It describes direct connections between nodes, i.e. paths of length 1. He then showed that, if a boolean product3 of matrices is used, the powers of the adjacency matrix Ar _{express the existence or non-existence of a}

path of lengthrbetween any two nodes. Floyd utilised this result to formulate an algorithm, which determines the shortest path between any two nodes [Floyd, 1962]. Here, the interest is only on whether a path from a node to itself exists at all, which would mean that the graph was cyclic. (Otherwise, if no cycle exists for any node, the graph is acyclic.) A DAG thus fulfils the necessary and sufficient condition: ∀r∈ {1, . . . , n−1}: diag(Ar) Ar₌“_a(r) i,j ” = a(₁r_,₁), . . . , a_n,n(r)= (0! , . . . ,0), (B.11)

which simply states that for none of the nodes a cycle of any lengthrexists.4

This completes the discussion of different graph-classes, and a problem that is shared between all graphical models is considered next: The joint representation of multiple networks as one. This is an important practical aspect of network inference, which can result in a whole set of networks rather than a single structure that is superior to all others. Different kinds of equivalence have been discussed (Section 1.4.2), which can cause ambiguities and hence multiple solutions. But such can also arise volitionally when sampling methods are used for network learning (Appendix D). Regardless of the reason for multiple solutions, situations in which too many results exist in order to inspect them separately require adequate techniques to understand them. This is why the following section introduces suitable methods to compact results. Such post-processing methods might reveal commonalities between recovered networks, for example, and format shared features such that they can be perceived.

3_{In ordinary matrix multiplication, elements are multiplied and added. The boolean prod-}

uct of two binary matricesA= (ai,j) and B= (bi,j) uses a logical-and∧ and logical-or∨ instead, i.e. (AB)i,j=Wnk=1ai,k∧bk,j . The resulting matrix is hence a binary matrix as well.

4_{Note that paths of length}_n_{or greater must be cycles; in such cyclic path, at least one}

node is visited (at least) twice. The full cycle must thus contain a sub-cycle of length smaller thann, which causes a violation of condition (B.11).

B.3 Compact Representation

In document Causal pattern inference from neural spike train data (Page 179-183)