• No results found

2.3 Probabilistic Relational Models

2.3.1 Markov Logic Networks

One of the most well established frameworks in the area of probabilistic models are MLNs [188]3. MLNs are probabilistic relational models that are defined by weighted first-order formulas. More precisely, an MLN is a set of pairs (Fi, wi) where Fi is a first-order formula and wi∈ R is the formula’s weight. In addition, we have a set of typed constants C ={C1, . . . , C|C|}. We can now use the MLN to define an MRF. We add one binary variable to the MRF for every ground predicate in the MLN, e.g., smokes(Anna). The value of the random variable models the truth state of the ground predicate. Due to the finite set of constants, this results in a finite set of random variables. Furthermore, we add one potential φi to the MRF for each possible grounding of a formula, e.g., smokes(Anna)⇒ cancer(Anna). The potential is defined as exp(wi· gi) where gi is a feature function for the formula fi. Hence, gi is defined over all ground predicates appearing in the formula and evaluates to 1 if the formula is satisfied and to 0 otherwise. Essentially, MLNs serve as a template engine for MRFs that allow to easily model independencies based on logical rules. The resulting joint probability of the MRF is a log-linear model and looks as follows:

p(x) = 1 Z exp X i wi· ni(x) ! , 3

A reference implementation of MLNs, including various inference and learning algorithms, is available online: alchemy.cs.washington.edu

λ1 : smokes(x,t)⇒ cancer(x,t)

λ2 : friends(x, y, t)⇒ (smokes(x, t) ⇔ smokes(y, t)) λ3 : cancer(x, t)⇔ cancer(x, succ(t))

λ4 : smokes(x, t)⇔ smokes(x, succ(t))

λ5 : friends(x, y, t)⇔ friends(x, y, succ(t))

Table 2.2: Formulas used in the Smokers-and-Friends DMLN.

with ni(x) corresponding to the number of true groundings of formula Fi. We will now give a small toy example of an MLN that will be used below for experiments on synthetic datasets.

Example 2.1. One of the most prominent toy examples is the Smokers-and-Friends MLN. Intuitively, it models a social network containing friendships between persons and their smoking habits. It also describes the influence of friendships on smoking habits and the implication of smoking causing cancer. The minimal rules that define this MLN are depicted in Table 2.1. The MRF that is obtained by grounding this MLN in the presence of two persons only is shown in Figure 2.3. Given a partial observation of the predicates, MAP inference asks for the assignment to the remaining variables. A marginal query such as p(cancer(Anna) = 1) returns the probability that “Anna” has cancer given a specific network.

The truth state of a predicate may also depend on the time. For example, a person is usually not born having cancer but instead sickens while aging. Similarly, a smoker can quit smoking in the future. Both examples require the predicates to be extended by a discrete time step, e.g., smokes(x, t). Although the additional time step t allows us to specify the predicates for specific points in time, we still need another mechanism that allows us to connect predicates of two subsequent time steps. To achieve this, a successor function succ(t) is added which maps point t to t + 1. The resulting framework is called Dynamic MLNs (DMLNs) and the Smokers-and-Friends MLN from above can be extended as follows.

Example 2.2. The introduction of a time step and the successor function allows us to formulate the Smokers-and-Friends MLN over time. The new rules are depicted in Table 2.2. Besides the rules from the static MLN extended by a time step (1 & 2), we have three new rules based on the succ-function. These rules define the behavior of the predicates over subsequent time steps.

In Chapter 3, several experiments will be based on MLNs. Additionally, we will introduce a graph construction algorithm for Label Propagation in Section 5.1 that is based on weighted logics similar to MLNs. We will now briefly discuss inference and learning in MLNs.

Inference and Learning in MLNs

The most straightforward way to run inference in MLNs is to ground the network and then run a standard marginal or MAP inference algorithm from the MRF literature, such as BP or Gibbs sampling. Interestingly, MAP inference in MLNs can equally be achieved by solving a weighted MAX-SAT problem. However, several other algorithms specifically designed for MLNs have been introduced in recent years as well. One of the most well known ones is MC-

SAT [181] which combines ideas from satisfiability solving with MCMC techniques. Technically speaking, MC-SAT makes use of slice sampling and samples a new state with the help of SampleSAT [243]. An interesting MAP inference approach is the cutting plane based algorithm introduced by Riedel [190]. Inspired by cutting plane methods from Operations Research, the algorithm avoids instantiating the full ground MLN and instead only adds ground formulas as long as the current solution can be further improved. By doing so, the corresponding MRF remains often much smaller and inference is more efficient. This cutting plane algorithm is already one approach avoiding the instantiation of the entire MRF. Ground MLNs quickly become very large and hence staying on the first-order level is desirable. Lifted inference, which will be explained in the next section, tries to avoid shattering the model as much as possible to remain on the lifted level and is therefore a popular technique to speed up inference algorithms for MLNs. Looking from a different angle, lifted inference tries to exploit symmetries by avoiding to ground identical propositional parts and grouping together indistinguishable variables and formulas. Evidence, however, can lead to models where lifted inference reduces completely to ground inference because evidence destroys symmetries in many situations. Lifted inference and handling evidence in PGMs will be of major interest throughout different chapters of this thesis. A different approach to speeding up inference in MLNs is the Frog algorithm presented by Shavlik and Natarajan [200]. Potentially, Frog could be combined with lifted inference and is particular useful when subsequent queries for the same MLN are necessary. The algorithm specifically looks at the evidence and by doing so, reduces the size of the grounded MLN. It exploits the fact that the count of formulas satisfied by the evidence remains unchanged for subsequent queries. Compared to lifting, a key advantage of this approach is its independence from the inference algorithm and it can be seen as a general preprocessing step.

Running inference assumes that the structure and parameters are given. However, in many situations it is not trivial to choose appropriate weights. In the extreme case, it is even impossible to specify the structure, i.e., the logical rules, of an MLN. Instead, we are only given a relational database representing a single mega training example. Based on this training database, we can use classical ILP techniques to infer rules for our MLN. For example, in early work on MLNs, Claudien [45] was used by Richardson and Domingos [188] to learn a set of first-order clauses. However, it was shown by Kok and Domingos [123] that directly optimizing a pseudo log-likelihood score often gives better results than a purely logic-based approach. Their approach starts with all unit clauses and then adds additional literals to the clauses as long as the log-score improves. This is a very common approach in MLN structure learning. A search algorithm proposes new candidate formulas by altering the current ones, and a score function determines which of the new formulas to add to the MLN. Recently, there has also been observed a surge in structure learning approaches based on decision and regression trees. Decision trees [118, 140] have the advantage that learning is done very fast and each path to a leaf can be seen as a feature combining different variables and values. In a similar line of work, Khot et al. [119] use gradient tree boosting to learn the parameters and the structure of MLNs. Gradient tree boosting elegantly combines parameter and structure learning with the help of regression trees. We will follow the idea of gradient tree boosting in Section 7.3 where we will show that pseudo log-likelihood optimization and tree-based learning also work well for learning probabilistic count models based on Dependency Networks. We already touched upon the idea of lifted inference and indicated how lifting can reduce run time in PGMs. We will now review several different lifted inference approaches and present Belief Propagation in detail.