Related Work - Combining Representation Learning with Logic for Language Processing

Embeddings for Knowledge Base Completion Many methods for embedding predicates and constants (or pairs of constants) based on training facts for knowledge base completion have been proposed in the past (see Section 2.3.2). Our work goes further in that we learn embeddings that follow not only factual but also first-order logic knowledge. Note that the method of regularizing symbol embeddings by rules described in this chapter are generally compatible with any existing neural link prediction model that provides per-atom scores between0.0 and 1.0. In our experi- ments we only worked with matrix factorization as neural link prediction model but based on our work Guo et al. [2016] were able to incorporate transitivity rules into TransE [Bordes et al., 2013] which models entities separately instead of learning a representation for every entity pair.

Logical Inference A common alternative, where adding first-order logic knowledge is trivial, is to perform symbolic logical inference [Bos and Markert, 2005, Baader et al., 2007, Bos, 2008]. However, such purely symbolic approaches cannot deal with the uncertainty inherent to natural language and generalize poorly.

Probabilistic Inference To ameliorate some of the drawbacks of symbolic logical inference, probabilistic logic based approaches have been proposed [Schoenmackers et al., 2008, Garrette et al., 2011, Beltagy et al., 2013, 2014]. Since logical connec- tions between relations are modeled explicitly, such approaches are generally hard to scale to large KBs. Specifically, approaches based on Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] encode logical knowledge in dense, loopy graphical models, making structure learning, parameter estimation, and inference hard for the scale of our data. In contrast, in our model the logical knowledge is captured directly in symbol representations, leading to efficient inference at test time as we only have to calculate the forward pass of a neural link prediction model. Furthermore, as symbols are embedded in a low-dimensional vector space, we have a natural way of dealing with linguistic ambiguities and label errors that appear once OpenIE textual patterns are included as predicates for automated KB completion [Riedel et al., 2013].

Stochastic grounding is related to locally grounding a query in Programming with Personalized PageRank (ProPPR) [Wang et al., 2013]. One difference is that we use stochastically grounded rules as differentiable terms in a representation learning training objective, whereas in ProPPR such grounded rules are used for stochastic inference without learning symbol representations.

Weakly Supervised Learning Our work is also inspired by weakly supervised approaches [Ganchev et al., 2010] that use structural constraints as a source of indirect supervision. These methods have been used for several NLP tasks [Chang et al., 2007, Mann and McCallum, 2008, Druck et al., 2009, Singh et al., 2010]. The semi-supervised information extraction work by Carlson et al. [2010] is in spirit similar to our goal as they are using commonsense constraints to jointly train multiple information extractors. A main difference is that we are learning symbol representations and allow for arbitrarily complex logical rules to be used as regularizers for these representations.

Combining Symbolic and Distributed Representations There have been a num- ber of recent approaches that combine trainable subsymbolic representations with symbolic knowledge. Grefenstette [2013] describes an isomorphism between first- order logic and tensor calculus, using full-rank matrices to exactly memorize facts. Based on this isomorphism, Rockt¨aschel et al. [2014] combine logic with matrix factorization for learning low-dimensional symbol embeddings that approximately satisfy given rules and generalize to unobserved facts on toy data. Our work extends this workshop paper by proposing a simpler formalism without tensor-based logical

connectives, presenting results on a large real-world task, and demonstrating the utility of this approach for learning relations with no or few textual alignments.

Chang et al. [2014] use Freebase entity types as hard constraints in a tensor factorization objective for universal schema relation extraction. In contrast, our approach is imposing soft constraints that are formulated as universally quantified first-order rules.

de Lacalle and Lapata [2013] combine first-order logic knowledge with a topic model to improve surface pattern clustering for relation extraction. Since these rules only specify which relations can be clustered and which cannot, they do not capture the variety of dependencies embeddings can model, such as asymmetry. Lewis and Steedman [2013] use distributed representations to cluster predicates before logical inference. Again, this approach is not as expressive as learning subsymbolic representations for predicates, as clustering does not deal with asymmetric logical relationships between predicates.

Several studies have investigated the use of symbolic representations (such as dependency trees) to guide the composition of symbol representations [Clark and Pulman, 2007, Mitchell and Lapata, 2008, Coecke et al., 2010, Hermann and Blunsom, 2013]. Instead of guiding composition, we are using first-order logic rules as prior domain knowledge in form of regularizers to directly learn better symbol representations.

Combining symbolic information with neural networks has a long tradition. Towell and Shavlik [1994] introduce Knowledge-Based Artificial Neural Networks whose topology is isomorphic to a KB of facts and inference rules. There, facts are input units, intermediate conclusions hidden units, and final conclusions (inferred facts) output units. Unlike in our work, there are no learned symbol representations. H¨olldobler et al. [1999] and Hitzler et al. [2004] prove that for every logic program there exists a recurrent neural network that approximates the semantics of that program. This is a theoretical insight that unfortunately does not provide a way of constructing such a neural network. Recently, Bowman [2013] demonstrated that Neural Tensor Networks (NTNs) [Socher et al., 2013] can accurately learn natural logic reasoning.

The method presented in this chapter is also related to the recently introduced Neural Equivalence Networks (EqNets) [Allamanis et al., 2017]. EqNets recursively construct neural representations of symbolic expressions to learn about equivalence classes. In our approach, we recursively construct neural networks for evaluating Boolean expressions and use them as regularizers to learn better symbol representations for automated KB completion.

In document Combining Representation Learning with Logic for Language Processing (Page 61-64)