Simple Monadic LFP and Conditional Independence

In this section, we exploit the limitations described in the previous section to build conceptual bridges from least fixed point logic to the Markov-Gibbs picture of the preceding section. At first, this may seem to be an unlikely union. But we will establish that there are fundamental conceptual relationships between the directed Markovian picture and least fixed point computations. The key is to see the constructions underlying least fixed point computations through the lens of influence propagation and conditional independence. In this section, we will demonstrate this relationship for the case of simple monadic least fixed points. Namely, a FO(LFP) formula without any nesting or simultaneous induc- tion, and where the LFP relation being constructed is monadic. In later sections, we show how to deal with complex fixed points as well.

We wish to build a view of fixed point computation as an information propagation algorithm. In order to do so, let us examine the geometry of information flow during an LFP computation. At stage zero of the fixed point computation, none of the elements of the structure are in the relation being computed. At the first stage, some subset of elements enters the relation. This changes the local neighborhoods of these elements, and the vertices that lie in these local neighborhoods change their local type. Due to the global changes in the multiset of local types, more elements in the structure become eligible for inclusion into the relation at the next stage. This process continues, and the changes “propagate” through the structure. Thus, the fundamental vehicle of this information propagation is that a fixed point computation ϕ(R, x) changes local neighborhoods of elements at

each stage of the computation. This propagation is 1. directed, and

2. relies on a bounded number of local neighborhoods at each stage. In other words, we observe that

The influence of an element during LFP computation propagates in a similar manner to the influence of a random variable in a directed Markov field. This correspondence is important to us. Let us try to uncover the underlying principles that cause it. The directed property comes from the positivity of the first order formula that is being iterated. This ensures that once an element is inserted into the relation that is being computed, it is never removed. Thus, influence flows in the direction of the stages of the LFP computation. Fur- thermore, this influence flow is local in the following sense: the influence of an element can propagate throughout the structure, but only through its influence on various local neighborhoods.

This correspondence is most striking in the case of bounded degree structures. In that case, we have only O(1) local types.

Lemma 4.10. On a graph of bounded degree, there is a fixed number of non-isomorphic

neighborhoods with radius r. Consequently, there are only a fixed number of local r- types.

In order to determine whether an element in a structure satisfies a first order formula we need (a) the multiset of local r-types in the structure (also known as its global type) for some value of r, and (b) the local type of the element. Furthermore, by threshold Hanf, we only need to know the multiset of local types up to a certain threshold.

For large enough structures, we will cross the Hanf threshold for the multiset of r-types. At this point, we will be making a decision of whether an element enters the relation based solely on its local r-type. This type potentially changes

with each stage of the LFP. At the time when this change renders the element eligible for entering the relation, it will do so. Once it enters the relation, it changes the local r-type of all those elements which lie within a r-neighborhood of it, and such changes render them eligible, and so on. This is how the computation proceeds, in a purely stage-wise local manner. This is a Markov property: the influence of an element upon another must factor entirely through the local neighborhood of the latter.

In the more general case where degrees are not bounded, we still have fac- toring through local neighborhoods, except that we have to consider all the local neighborhoods in the structure. However, here the bounded nature of FO comes in. The FO formula that is being iterated can only express a property about some bounded number of such local neighborhoods. For example, in the Gaifman form, there are s distinguished disjoint neighborhoods that must satisfy some local condition.

Remark 4.11. The same concept can be expressed in the language of sufficient statistics. Namely, knowing some information about certain local neighborhoods renders the rest of the information about variable values that have entered the relation in previous stages of the graph superfluous. In particular, Gaifman’s theorem says that for first order properties, there exists a sufficient statistic that is gathered locally at a bounded number of elements. Knowing this statistic gives us conditional independence from the values of other elements that have already entered the relation previously, but not from elements that will enter the relation subsequently. This is similar to the directed Markov picture where there is conditional independence of any variable from non-descendants given the value of the parents.

At this point, we have exhibited a correspondence between two apparently very different formalisms. This correspondence is illustrated in Fig. 4.1.

X

₁

X

_n-1

X

Interacting variables, highly constrained by one another

LFP assumes conditional independence after statistics are obtained

X

₂

Conditional Independence and factorization over a larger directed model called the ENSP (developed in Chapter 7)

Φ

₂

Φ

₁

Φ

_s-1

Bounded number of local statistics at each stage

In document pnp12pt (Page 49-53)