Embedding Propositional Logic - Combining Representation Learning with Logic for Language Proce

We know from propositional logic that with the negation and conjunction opera- tors we can model any other Boolean operator and propositional rule. In Eq. 3.3, we effectively turned a symbolic logical operation (negation) into a differentiable operation that can be used to learn subsymbolic representations for automated KB completion. If we can find such a differentiable operation for conjunction, then we could backpropagate through any propositional logical expression, and learn vector representations of symbols that encode given background knowledge in propositional logic.

Conjunction In Product Fuzzy Logic, conjunction is modeled using a Product t- Norm [Lukasiewicz, 1920]. LetF=A∧Bbe the conjunction of two propositional

expressionsAandB. The probability ofH is then defined as follows:

JFK = JA∧BK = JAK JBK . (3.4)

In other words, we replaced conjunction, a symbolic logical operation, with multipli- cation, a differentiable operation. Note that alternatives for modeling conjunction

exist. For instance, one could take the min ofJAK and JBK (G¨odel t-Norm [G¨odel,

1932]).

Given the probability of ground atoms, we can use Product Fuzzy Logic to calculate the probability of the conjunction of these atoms. However, we will go a step further and assume that we know the ground truth probability of the conjunction of two atoms. We can then use the negative log-likelihood loss to measure the discrepancy between the predicted probability of the conjunction and the ground truth. Our contribution is backpropagating this discrepancy through the propositional rule and a neural link prediction model that scores ground atoms to calculate a gradient with respect to vector representations of symbols. Subsequently, we update these representations using gradient descent, thereby encoding the ground truth of a propositional rule directly in the vector representations of symbols. At test time, predicting a score for any unobserved ground atomrs(ei, ej) is done efficiently by

calculating_Jrs(ei, ej)K.

Disjunction LetF = A∨Bbe the disjunction of two propositional expressionsA

andB. Using De Morgan’s law and Eqs. 3.3 and 3.4, we can model the probability

ofFas follows: JFK = JA∨BK =_{J¬ (¬ (}A∨B))K =_{J¬ (¬}A∧ ¬B)K = 1_{− (1 − J}AK)(1 − JBK) =_JAK + JBK − JAK JBK . (3.5) Note that Eq. 3.3 not only holds for ground atoms, but any propositional logical expression. Furthermore, any propositional logical expression can be normalized to Conjunctive Normal Form. Thus, with Eqs. 3.3 to 3.5 we now have a way to construct a differentiable computation graph, and thus a real-valued loss term, for any symbolic expression in propositional logic.

Implication A particular class of logical expressions that we care about in practice are propositional implication rules of the form H :–B, where the body B is a possibly

JparentOfK Jhomer, bartK JmotherOfK JfatherOfK u1 u2 u3 dot dot dot u4 u5 u6

sigm sigm sigm

u7 1−• u9 ∗ u8 •− 1 u10 ∗ u11 •_{+ 1} loss −log A toms Rule Loss

Figure 3.1: Computation graph for rule in Eq. 3.7 where· denotes a placeholder for the output of the connected node.

empty conjunction of atoms represented as a list, and the head H is an atom. Let

F=H :–B. The probability ofFis then modeled as:

JFK = JH :–BK =_J¬B∨HK =_J¬BK + JHK − J¬BK JHK = 1− JBK + JHK − (1 − JBK) JHK = 1_{− J}BK + JHK − JHK + JBK JHK = 1_{− J}BK + JBK JHK =_JBK (JHK − 1) + 1. (3.6)

Say we want to ensure that

fatherOf(HOMER,BART) :–

parentOf(HOMER,BART)

¬ motherOf(HOMER,BART). (3.7)

We now have a way to map this rule to a differentiable expression that we can use alongside facts in Eq. 3.2 and optimize the symbol representations using gradient descent as we did previously for matrix factorization. The computation graph that allows us to calculate the gradient of this rule with respect to symbol representations is shown in Fig. 3.1. While the structure of the bottom part (Atoms) of this computation graph is determined by the neural link prediction model, the middle part (Rule) is determined by the propositional rule. Note that we can use any neural link predictor (see Section 2.3.2) instead of matrix factorization for obtaining a probability of ground atoms. The only requirement is that ground atom scores need to lie in the interval[0, 1]. However, for models where this is not the case, we can always apply a transformation such as the sigmoid.

Independence Assumption Equation 3.4 underlies a strong assumption, namely that the probability of the arguments of the conjunction are conditionally independent given symbol embeddings. We already get a violation of this assumption for the simple caseJF∧FK with 0 < JFK < 1, which results in JF∧FK = JFK JFK < JFK.

However, for dependent arguments we get an approximation to the probability of the conjunction that can still be used for gradient updates of the symbol representations, and we demonstrate empirically that conjunction as modeled in Eq. 3.4 is useful for improving automated KB completion. In Chapter 4, we will present a way to avoid this independence assumption for implications.

In document Combining Representation Learning with Logic for Language Processing (Page 49-52)