Error Analysis and Discussion

Chapter 4 Identifying Causality in Verb-Verb Pairs

4.2 Empirical Study

4.2.5 Error Analysis and Discussion

In order to perform error analysis, we choose the best performing model PDTBe_vi-e_vj + KB1BCA + LD +

¬Cev = {R} which produces 30.41% precision, 68.09% recall and 42.04% F-score. We randomly selected

100 false positives and 50 false negatives from the predictions of this model. In the rest of this section, we provide frequent types of errors made by the above mentioned model which results in false positive and false negative predictions.

False Positives

After the analysis of false positives, we observed the following frequent types of errors:

• About 47% instances of the false positives are encoding expansion, continuation or list relations and are mistakenly identified as causal instances by the best performing model. Consider the following two examples of such type of instances:

1. But sales in the oil-patch state of Texas surged 12.9% and sales in South Carolina jumped 10.6% in the period, the New York trade group said.

2. Taiwan’s USI Far East Corp., a petrochemical company , initialed the agreement with an uniden- tified Japanese contractor to build a naphtha cracker , according to Alson Lee , who heads the Philippine company set up to build and operate the complex.

Notice that in example (1) two events “sales surged 12.9%” and “sales jumped 10.6%” are synonymous events i.e., both events are representing similar state of affairs. The background knowledge (i.e., KB1BCA) currently predicts that the pair surge-jump has a high tendency to encode a cause relation

than the non-cause one. In order to handle such mistakes we need to incorporate a mechanism on top of our model to identify synonymous events. For example, our model should know that surge and jump are nearly synonyms. In addition to this, there is more evidence available in example (1) which can help filter this example from the false positives. For example, notice that both verbs surge and jump appear in the similar context i.e., {sales verb PERCENTAGE}. This provides additional evidence to assign the label ¬C to example (1). Example (1) is an easy case of verb-verb pair encoding a non- cause relation. Consider example (2) where the pair “build-operate” does not consist of synonymous verbs and can encode both cause or non-cause relations depending on the context. A model needs to process this example more deeply to identify a non-cause relation in two events. Note that there are purpose (or cause) relations in the pairs “set up-build” and “set up-operate” as identified by the preposition “to”. The events ebuildand eoperate conjoined by the marker “and” are effects of the same

event eset−up and are encoding a non-cause relation. A model can simply predict the label ¬C for

the events evi and evj following a structure evi ← evk → evj where evk causes both evi and evj. This

type of reasoning is not always correct because we know that the events evi and evj can influence each

other even if they are effects of the same event evk. In this situation we can learn and incorporate the

general tendencies of the events. For example, two events evi and evj representing effects of the same

event evk may always have a high tendency to encode non-causality if these events are conjoined by the

marker “and”. From the above examples, it is evident that the model for identifying causality needs to deeply process natural language instances to distinguish expansion, continuation or list relations from the causal relations.

• In about 28% instances of the false positives two events are not even directly relevant and are mistakenly identified as causal instances. Consider the following two examples of such type of instances:

3. US Airways is using a similar system on its Airbus aircraft to eliminate all non-precision ap- proaches at every runway of every airport where the airline’s Airbus aircraft flies.

4. Traders expected a rise of only 50 billion to 85 billion cubic feet because cold weather in the U.S. was thought to have boosted demand for heating fuels more. The increase put the nation ’s gas storage within 8 percent of where it was a year ago at this time , when inventories were considered sufficient.

Note that in example (3) the events euse andf ly are not encoding causality. In fact, these events are

not directly relevant to each other in the given context. In this example the event ef ly is explaining

a fact about some airports. Therefore, in order to identify causality our model first needs to identify if two events are directly relevant or not in the current context. If the two events are not directly relevant then there is a high tendency of encoding non-causality. Similarly in example (4) the event ethink is directly relevant to the event eboost and it does not have a direct relation with the event eput.

Instead the event eboostencodes a cause relation with the event eput. Our current model does not have

a mechanism to identify if two events are directly relevant or not and this leads to lots of false positives in the predictions.

• For about 16% instances of the false positives our model fails to identify REPORTING events. Consider the following example where a REPORTING event is just describing another event instead of encoding causation with it:

5. And on the West Coast , evidence today shows that a monster quake in 1700 ruptured 500 miles of the ground from Puget Sound south and sent a huge tsunami that flooded coastal Japan . In example (5) eshow (a REPORTING event) is just describing the event esend instead of encoding

a cause relation with it. Our model fails to identify eshow as a REPORTING event because in the

TimeBank corpus of events there is a total of 9 instances of the verb “show” and only 2 instances are of REPORTING class. Therefore, in the future we need more training data for the semantic classes of events to avoid mistakes in identifying these semantic classes. The TimeBank corpus contains only 7924 instances of verbal events and thus it can result in wrong predictions of the semantic classes as demonstrated by the example (5). In this situation, we can utilize other resources with the semantic classes of verbs. For example, WordNet provides 15 semantic classes of verbs and we can acquire a large number of instances of these classes from the WordNet’s glosses/examples. Some of the above mentioned 15 classes are VERB. Body, VERB. Communication, VERB.Cognition, etc. Here, the VERB.Communication closely maps to the REPORTING events. For the current task, we can acquire the instances of verbs from the WordNet and organize the 15 semantic classes of verbs into the categories Cev and ¬Cev. The fine-grained 15 semantic classes of verbs and their large number of instances from

the WordNet may help achieve better results for the current task.

• In the rest of the 9% instances of the false positives, two events are either encoding comparison or temporal only relations and are mistakenly identified as causal relations. As discussed above, the model for identifying causality needs to deeply process natural language instances to distinguish non-cause

relations (i.e., expansion, continuation, list, comparison and temporal only) from the causal relations.

False Negatives

After the analysis of false negatives, we observed the following frequent case of errors:

• In about 88% instances of the false negatives, the verb-verb pairs apparently encode a non-cause relation but some facts about these pairs allow them to encode causality. Consider the following example of a such pair:

6. Leave it alone and unlocked and you might return to find a stranger sitting in the driver’s seat just getting the feel of the car.

Apparently the pair “leave-sit” seems to be a non-causal pair and in our test set there is a total of 5 instances of this pair with the label ¬C. Example (6) is the only causal instance of this pair in our test set. The background knowledge KB1BCA predicts that this pair has a high tendency to encode

non-causality. Using the background knowledge KB1BCA, our model predicts correct label (i.e., ¬C)

on 83.33% instances of the pair “leave-sit”. In order to predict the label C for example (6) our model needs to have more specific knowledge about the pair “leave-sit”. For example, our model should know that “if you leave some space X then somebody can sit on the space X”. Also our model must have information that “it” and “the driver’s seat” are more or less referring to the same entities in example (6). After acquiring the above mentioned information we need to validate if the above rule (i.e., Leave some space X → somebody sit on X) is satisfied in the context of example (6) or not.

• In about 12% instances of the false negatives, the verb-verb pairs apparently seem to encode causality with a high tendency and the background knowledge KB1BCA has failed to identify the causal connec-

tions. For example, the pair “want-push” seems to encode a causal association but KB1BCA considers

this pair as a non-causal pair. There is a total 2 cause and 4 non-cause instances of this pair in the test set. So our model does not reduce recall a lot by considering the pair “want-push” as non-causal pair. But in future we need to incorporate more sources of knowledge to determine the accurate tendency of each pair to encode causation.

In document Mining novel sources of knowledge to identify causal information in text (Page 80-83)