Chapter 3 Knowledge Acquisition for Verb-Verb Pairs
3.2 Extraction of Background Knowledge
3.2.1 Explicit Causal Association (ECA)
In order to find the likelihood of a verb-verb pair to encode causal relations, we introduce a novel metric Explicit Causal Association (ECA) as follows:
ECA(vi-vj) = 1 | V P | X Ivi-vj∈V P (CD(vi-vj) × CI) (3.8)
where V P is the set of intra- and inter-sentential instances of verb-verb pairs. An instance of vi-vjpair
is denoted by Ivi-vj. CD determines the causal dependency of a verb-verb pair in an unsupervised fashion
(equation 3.2), and CI finds the tendency of instance I of vi-vjpair to belong to the cause class as compared
to the non-cause class using the training corpus of event-event pairs. The goal of ECA is to combine the unsupervised causal dependency score (i.e., CD) with the supervised score of instance I of belonging to the cause class than the non-cause one (i.e., CI). Here, CD represents the prior knowledge about the causal
association based on the co-occurrence probabilities and idf scores (equation 3.2). It can discover lots of false positives because the co-occurrence probabilities can fail to differentiate causality from any other type of correlation. We improve the prior knowledge obtained from CD with the help of supervision from the training corpus of both C and ¬C relations. The global decision of causal association of a verb-verb pair is made by taking the average of scores on all instances of that pair. Notice that CD can also be moved out from the summation function in equation 3.8.
We define the score CI as follows:
CI =
P (I, C)
P (I, ¬C) (3.9)
obtain these probabilities using both Naive Bayes and Maximum Entropy classifiers introduced in section 3.1. However, in our model we do not employ the Maximum Entropy classifier for the calculation of CI because
it works very slow on the massive development set. Therefore, we employ the following function for the fast computation: CI = n X k=1 log( P (fk| C) P (fk| ¬C) ) (3.10)
The notation fk represents a feature on an instance I. In section 3.1.2, we have introduced a set of
linguistic features we employ to predict the labels C and ¬C. P(fk | C) and P(fk | ¬C) are the smoothed
probabilities of a feature fk given the cause and non-cause training instances. The value of CI is positive
only when the instance I has more tendency to encode a cause relation than a non-cause one. To avoid negative values, we map the scores of CI to the range [0, 1] using CCmaxI−C−Cminmin where Cmin (Cmax) is the
minimum (maximum) value of CI obtained on the development set, respectively. Also, we add a small value
to CI to avoid 0 value. Similarly, to avoid negative scores of PMI in equation 3.2 we can map it to the
range [0,1].
We employed both training corpora Explicitevi-evj and PDTBevi-evj (see section 3.1.1) to calculate the
scores of metric ECA. Our empirical evaluation revealed that the ECA scores acquired using Explicitevi-evj
corpus provides better source of background knowledge than the scores acquired using PDTBevi-evj. This
makes sense because we acquire causal associations by considering the scores of ECA on the massive devel- opment set and the training corpus PDTBevi-evj is very small for this purpose.
We selected top 500 scored verb-verb pairs using the metric ECA. Following are some examples of causal verb-verb pairs from these top 500 pairs: destroy-rebuild, convict-arrest, receive-download, ask-reply, score- win, etc. We also observed some false positives in the top 500 pairs i.e., those pairs which do not seem to encode a cause-effect relation. Some examples of these pairs are jump-rise, hit-strike, drop-fall, climb-gain, meet-discuss, etc. Notice that in these examples some pairs contain nearly synonymous verbs (e.g., jump- rise, hit-strike) or the verbs in temporal only relation (e.g., drop-fall, climb-gain, meet-discuss). In the next chapter we empirically evaluate performance of the metric ECA by using the causal associations in verb-verb pairs derived from this metric in our model for identifying causality.
Natural language allows the expression of semantic relations in both ambiguous and implicit contexts. This fact increases the complexity of the current task to a large extent. Sporleder and Lascarides (2008) raised an important observation that people tend to avoid unnecessary redundancy while expressing semantic relations. For example, they prefer not to use a discourse marker when a semantic relation can be inferred from other elements of the context. Taking this observation forward, we assume that when a verb-verb
pair is strongly causal in nature (e.g., kill-arrest) then people may hardly use an explicit and unambiguous discourse marker to express the causation encoded by this pair. The strong causal link of this pair is obvious to the readers even when an ambiguous or no discourse marker is available to signal causality encoded by this pair. Therefore, the causality of such verb-verb pairs can remain undiscovered by the metric ECA because this metric relies on the supervision from Explicitevi-evj corpus in which two events of each training
instance appear in explicit and unambiguous context. We use the term training data sparseness for this problem where the strongly causal verb-verb pairs hardly appear in the Explicitevi-evj training corpus. Due
to this problem, we can mistakenly consider the strongly causal verb-verb pairs as non-causal. In the next section, we introduce a metric which addresses this problem to derive the better scores of causal associations in verb-verb pairs.