In this thesis we investigated and developed connections between the field of probabilistic Graphical Models (GMs) and Sum-Product Networks (SPNs).
First, we investigated SPNs that employ GMs as leaf models, obtaining a derivation of EM that allows to learn jointly the SPN parameters and the GM leaves ones.
We introduced Sum-Product Graphical Model (SPGM), a new architecture bridging Sum-Product Networks (SPNs) and Graphical Models (GMs) by inheriting the expressivity of SPNs and the high level probabilistic semantics of GMs. The new connections between the two fields were exploited in a structure learning algorithm extending the Chow-Liu tree approach.
Finally, we presented several applications of the new tools and architectures, with a particular focus on SPGMs. We found that the structure learning algorithm LearnSPGM is competitive with the state of the art methods in density estimation despite using a novel approach and being the first algorithm to directly obtain DAG structured SPNs. Furthermore, we showed some application settings of SPGMs that denote promise of such architecture in real world scenarios that are typically out of the domain of SPNs.
6.2. Future Work
As a concluding note to this thesis, we present several extensions to the framework of Sum-Product Graphical Models that should be subject of future work.
6.3. Generalizing SPGMs
The definition of SPGMs can be generalized in several ways to adapt it to different application settings:
• An undirected representation of SPGMs can be obtained straightforwardly by re- placing conditionals probabilities in Definition 4.1.4 with non-normalized potentials. Namely, terms Ps(Xs) and Pst(Xt|Xs) in the SPGM definition can be replaced by
factors ϕs(Xs) and ϕst(Xs, Xt), whose elements are constrained to be non negative
but they are not constrained to be normalized. Since message passing in directed and undirected tree graphical models assumes the same shape, all the subsequent propositions on SPGMs (that are based on message passing for trees) still maintain
6. Conclusions and Future Work
validity.
In contrast to the directed GM case, however, there is need to normalize the distribution by dividing for the partition function. The partition function can be efficiently computed with an evaluation of the SPGM with no observed variables, due to Proposition 4.1.7.
• Continuous variables can be used at leaf nodes, but not at the internal nodes. This can be done because summation can be substituted with integration at the leaf messages while at the same time maintaining a discrete number of states in the leaf messages (cf. Definition 4.1.5) . Thus, leaf messages can be passed to internal nodes exactly sas in discrete SPGMs. This allows to apply SPGMs to situations where the leaf variables should be continuous, which arise often e.g. in the field of image processing.
• The expressive power of SPGMs can be extended by modeling mixtures of Junction Trees rather than modeling mixtures of trees. In this way, each Junction Tree corresponds to inference in a graphical model with cycles, but with tractable treewidth (Section 2.3.3).
This can be done by performing three changes: firstly, each Vnode t ∈ V should be associated to a set of variables Xt, like nodes in Junction Trees, as opposed to a single variable like in SPGMs. Secondly, the running intersection property must be enforced between s and each s ∈ vpa(t) (cf. 2.3.3). Finally, the Vnode message must be modified to match the message sent between nodes in Junction Trees (Eq. 2.3.7).
With these changes, it is possible to show that SPGMs encode a mixture of Junction Trees exactly in the same way as usual SPGMs encode a mixture of trees.
6.4. Tree-Reweighted Message Passing with SPGMs
An interesting aspect of tractable probabilistic models is to approximate inference in intractable models, and a particularly relevant application of this concept lies in the approximation of intractable graphical models by a convex combination of trees. This line of research was first introduced for the maximization of the energy of intractable GMs in Wainwright et al. [2002] (Tree-Reweighted Message Passing). Then, Wainwright et al. [2003] extended the approach to the approximation of the log partition function, which allows to compute approximate marginals. Later,Kolmogorov[2006] showed a convergent procedure called Sequential that is guaranteed to increase a lower bound of the MAP energy iteratively. Finally,Meltzer et al. [2009] unified these approaches by showing that the convergent sequential procedure described inKolmogorov [2006] also allows to find an approximation of the log partition function by simply substituting summations with max. This algorithm provides an efficient, sequential convergent approximation for the partition function.
6.4. Tree-Reweighted Message Passing with SPGMs
At a high level, Tree-Reweighted Message Passing aims to approximate a certain probability distribution P (X) governed by a graph G = (V, E ) with a mixture model QK
k=1ρkTk(X), where {ρk ≥ 0} and {Tk}Kk=1are tree graphical models. It requires taking
each edges (s, t) ∈ G in a given order and re-estimating its associated pairwise probability
Pst with the following rules:
1. Reparametrization Step: compute the marginal probability Tk(Xs, Xt) for each
tree {Tk}Kk=1 in the mixture model (which requires passing messages in the tree). 2. Averaging Step: Substitute Pstwith the weighted average Pst=PKk=1ρkTk(Xs, Xt).
By iterating the procedure above and choosing a particular order of edges, Kolmogorov [2006] showed that the algorithm converges to a local minimum of the Kullback-Leibler divergence with respect to the original distribution.
A crucial step to make this procedure tractable is to use a mixture in which several sections of the trees are shared, in such a way that messages can be computed only once and parameters can be updated simultaneously for a large number of trees in the mixture. Kolmogorov[2006] obtains this by using a mixture model in which each element is a chain graph (i.e. each node is connected to at most two nodes), and in which nodes in all chains are consistent with a given ordering: since messages are in common between several chains, inference in the mixture can be greatly simplified. Furthermore, the marginals can be computed with only local operations due to the message structure. The authors note that using longer chains and a large number of mixture components empirically helps to obtain better maxima and faster convergence.
The last observations make using SPGMs for Tree-Reweighted Message Passing an appealing direction of future research, due to their ability to compactly model very large mixtures of trees by naturally sharing subsections of the trees and reusing the related messages. In particular, a preliminary analysis of this application showed that steps 1 and 2 above can be performed with local operations in SPGMs, similarly to what happens in mixture of chain graphs inKolmogorov [2006]. Thus, Tree-Reweighted Message Passing can potentially be applied to the very large, hierarchical mixture of trees encoded by SPGMs. This opens an interesting research area in learning SPGM to approximate an intractable graphical model, which should be subject of future work.