Explicit Causal Association (ECA) - Extraction of Background Knowledge

Chapter 3 Knowledge Acquisition for Verb-Verb Pairs

3.2 Extraction of Background Knowledge

3.2.1 Explicit Causal Association (ECA)

In order to find the likelihood of a verb-verb pair to encode causal relations, we introduce a novel metric Explicit Causal Association (ECA) as follows:

ECA(vi-vj) = 1 | V P | X I_vi-vj∈V P (CD(vi-vj) × CI) (3.8)

where V P is the set of intra- and inter-sentential instances of verb-verb pairs. An instance of vi-vjpair

is denoted by Ivi-vj. CD determines the causal dependency of a verb-verb pair in an unsupervised fashion

(equation 3.2), and CI finds the tendency of instance I of vi-vjpair to belong to the cause class as compared

to the non-cause class using the training corpus of event-event pairs. The goal of ECA is to combine the unsupervised causal dependency score (i.e., CD) with the supervised score of instance I of belonging to the cause class than the non-cause one (i.e., CI). Here, CD represents the prior knowledge about the causal

association based on the co-occurrence probabilities and idf scores (equation 3.2). It can discover lots of false positives because the co-occurrence probabilities can fail to differentiate causality from any other type of correlation. We improve the prior knowledge obtained from CD with the help of supervision from the training corpus of both C and ¬C relations. The global decision of causal association of a verb-verb pair is made by taking the average of scores on all instances of that pair. Notice that CD can also be moved out from the summation function in equation 3.8.

We define the score CI as follows:

CI =

P (I, C)

P (I, ¬C) (3.9)

obtain these probabilities using both Naive Bayes and Maximum Entropy classifiers introduced in section 3.1. However, in our model we do not employ the Maximum Entropy classifier for the calculation of CI because

it works very slow on the massive development set. Therefore, we employ the following function for the fast computation: CI = n X k=1 log( P (fk| C) P (fk| ¬C) ) (3.10)

The notation fk represents a feature on an instance I. In section 3.1.2, we have introduced a set of

linguistic features we employ to predict the labels C and ¬C. P(fk | C) and P(fk | ¬C) are the smoothed

probabilities of a feature fk given the cause and non-cause training instances. The value of CI is positive

only when the instance I has more tendency to encode a cause relation than a non-cause one. To avoid negative values, we map the scores of CI to the range [0, 1] using _CC_maxI−C_−Cmin_min where Cmin (Cmax) is the

minimum (maximum) value of CI obtained on the development set, respectively. Also, we add a small value

to CI to avoid 0 value. Similarly, to avoid negative scores of PMI in equation 3.2 we can map it to the

range [0,1].

We employed both training corpora Explicite_vi-e_vj and PDTBe_vi-e_vj (see section 3.1.1) to calculate the

scores of metric ECA. Our empirical evaluation revealed that the ECA scores acquired using Explicite_vi-e_vj

corpus provides better source of background knowledge than the scores acquired using PDTBe_vi-e_vj. This

makes sense because we acquire causal associations by considering the scores of ECA on the massive development set and the training corpus PDTBe_vi-e_vj is very small for this purpose.

We selected top 500 scored verb-verb pairs using the metric ECA. Following are some examples of causal verb-verb pairs from these top 500 pairs: destroy-rebuild, convict-arrest, receive-download, ask-reply, score- win, etc. We also observed some false positives in the top 500 pairs i.e., those pairs which do not seem to encode a cause-effect relation. Some examples of these pairs are jump-rise, hit-strike, drop-fall, climb-gain, meet-discuss, etc. Notice that in these examples some pairs contain nearly synonymous verbs (e.g., jump- rise, hit-strike) or the verbs in temporal only relation (e.g., drop-fall, climb-gain, meet-discuss). In the next chapter we empirically evaluate performance of the metric ECA by using the causal associations in verb-verb pairs derived from this metric in our model for identifying causality.

Natural language allows the expression of semantic relations in both ambiguous and implicit contexts. This fact increases the complexity of the current task to a large extent. Sporleder and Lascarides (2008) raised an important observation that people tend to avoid unnecessary redundancy while expressing semantic relations. For example, they prefer not to use a discourse marker when a semantic relation can be inferred from other elements of the context. Taking this observation forward, we assume that when a verb-verb

pair is strongly causal in nature (e.g., kill-arrest) then people may hardly use an explicit and unambiguous discourse marker to express the causation encoded by this pair. The strong causal link of this pair is obvious to the readers even when an ambiguous or no discourse marker is available to signal causality encoded by this pair. Therefore, the causality of such verb-verb pairs can remain undiscovered by the metric ECA because this metric relies on the supervision from Explicite_vi-e_vj corpus in which two events of each training

instance appear in explicit and unambiguous context. We use the term training data sparseness for this problem where the strongly causal verb-verb pairs hardly appear in the Explicite_vi-e_vj training corpus. Due

to this problem, we can mistakenly consider the strongly causal verb-verb pairs as non-causal. In the next section, we introduce a metric which addresses this problem to derive the better scores of causal associations in verb-verb pairs.

In document Mining novel sources of knowledge to identify causal information in text (Page 42-44)