The DIRT algorithm gives a method for computing similarity between phrases represented as patterns extracted from dependency parses. As discussed in 7.1, similarity computations using DIRT are accurate enough to allow paraphrase induction: given a target pattern, an entire collection of patterns is ranked based on the similarity with the target phrase. The highest ranked patterns are selected as paraphrases of the original phrase. An example of this is given in Table 7.6, which lists the top most confident paraphrases for the phrase X ←−− acquiresubj −−→ Y .obj
Although determining the context-appropriateness of an inference rule is a task that has been approached in the past, the question of performing context- sensitive paraphrase induction within the DIRT representation framework has
Towards context-sensitive paraphrase induction 95
DIRT Context1 Context2 Context3 No context (you, blood) (dog, pound) (study, light) drop give(1) drop reveal(2) lose throw lose(5) give give lose(3) give emit reveal discard disperse throw(3) relinquish reveal emit disperse throw transmit throw discard transmit spread reveal spread pass emit discard(1) transmit emit relinquish relinquish(1) relinquish spread drop transmit pass discard pass spread drop disperse disperse pass lose
Table 7.5: Ranked substitutes for the pattern X ←−− shedsubj −−→ Yobj . Correct substitutes are marked in bold.
X ←−−− acquiresubj −−→ Yobj X ←−−− getsubj −−→ Yobj
X ←−−− purchasesubj −−→ Yobj X ←−−− buysubj −−→ Yobj X ←−−− usesubj −−→ Yobj
X ←−−pobj− to←−−− sellprep −−→ Yobj X ←−−pobj− by←−−− ownprep −−→ Yobj X ←−− provideobj ←−−− withprep −−→ Yobj X ←−−pobj− to←−−− provideprep −−→ Yobj
Table 7.6: DIRT paraphrases for X ←−− acquiresubj −−→ Yobj
not been yet addressed. This task can be defined as follows:
Context-sensitive paraphrase generation: Given a target pattern pi and
an instance < wX, pi, wY >, return patterns pj such that pi and pj form an
inference rule in context wX, wY.
To exemplify this consider the pattern X ←−− shedsubj −−→ Y . In a newspaperobj domain this phrase has a strong bias towards the meaning of to fall, to drop as referring to market indexes. However, in contexts such as shed tear, shed blood or shed light, the phrase has completely different meanings. The ISP system of Pantel et al. [2007] identifies adequate semantic classes for shed ’s most fre- quent (financial) sense (see Table 7.1); however, we observe that no paraphrase reflecting any of the other senses is obtained in the top 100 paraphrases.
96 Towards context-sensitive paraphrase induction
This points to a property shared by all the methods developed to learn contex- tual preferences for DIRT rules: they are based, and therefore tuned, on the assumption that we are given a high-confidence DIRT rule. It is however not clear how the methods can be adapted in order to discover pairs of patterns which are in turn almost distributionally distinct in an out-of-context scenario.
The method we have proposed can be straightforwardly used for context sensi- tive paraphrase induction. In this section we perform a second experiment as we use it to paraphrase a set of patterns extracted from QA data. An evalua- tion for this task is in the purpose of future work and, at the moment, we only summarize a series of observations that can be made when manually inspect- ing the paraphrases generated. Chapter 9 uses this method for paraphrasing questions for an answer extraction module.
The task of context-sensitive paraphrasing can be expressed naturally in the framework we have proposed: given a pattern pi in context wX we return
the top N patterns pj which maximize sim(vec(pi, wX), vec(pj, wX)). For the
cases in which we are given two context words wX and wY, we maximize
sim(vec(pi, wX), vec(pj, wX)) ∗ sim(vec(pi, wY), vec(pj, wY)).
We use the LDA mixture model which we apply for paraphrasing a set of patterns extracted from questions found in TREC QA data. Only one context word is available for most patterns occurring in questions, as a second argument is often a un-informative question word (e.g. who, which). To this set of patterns we add the patterns encountered in the LST data set.
One of the main observations we make is that the paraphrases generated often convey appropriate, context-aware lexical variation, but are not substitutable in the provided context. Consider for example the pattern X ←−− appearsubj −prep−−→ on−−→ Y , extracted from the question:pobj
(8) When did Led Zeppelin appear on BBC?
This is paraphrased by DIRT as in Table 7.7 and by the context sensitive method (JS and cosine similarities) as in Table 7.8.
The context-insensitive substitutes extracted with DIRT are very accurate, with commonly returned patterns such as be on, see on or find on. The LDA method returns television-related paraphrases such as broadcast on bbc, broadcast by bbc, announced by bbc or tell bbc station. However, as it can be observed, while the variation in lexical items is meaning-appropriate, there is a significant drop in accuracy compared to DIRT, as many of the generated patterns cannot be used as substitutes.
We observe another class of errors, in which the patterns returned are very clearly associated with the given context, however not exhibiting the same meaning as the original pattern. Consider the pattern X ←−− appearsubj −prep−−→
Towards context-sensitive paraphrase induction 97
X ←−−− appearsubj −−−→ onprep −pobj−−→ Y X ←−− beobj −−−→ onprep −−→ Yobj X ←−− appearobj −−−→ inprep −−→ Yobj X ←−− releaseobj −−−→ onprep −−→ Yobj X ←−− seeobj −−−→ onprep −−→ Yobj X ←subjpass−−−−−− be−−−→ onprep −−→ Yobj X ←−− goobj −−−→ onprep −−→ Yobj X ←subjpass−−−−−− f ind−−−→ onprep −−→ Yobj X ←−− comeobj −−−→ onprep −−→ Yobj
Table 7.7: DIRT paraphrases for X ←−− appearsubj ←prep−−− on−−→ Ypobj
zeppelin, X←−−− appearsubj −−−→ onprep −pobj−−→ Y , bbc
LDA - inverse JS LDA - cosine
X ←subjpass−−−−−− broadcast−−−→ byprep −−−pobj→ Y X ←pobj−−− in←−−− channelprep −−→ Ynn X ←−−−−− broadcastpartmod −−−→ byprep −pobj−−→ Y X ←−−− tvposs −−→ Ynn
X ←−−− quotesubj −−→ internationalobj −−→ Ynn X ←subjpass−−−−−− announce−−−→ byprep −pobj−−→ Y X ←−−− tellsubj −−→ stationobj −−→ Ynn X ←−−− tvaposs −−→ Ynn
X ←−−− tellsubj −−→ programobj −−→ Ynn X −−−→ f orprep −pobj−−→ television−−→ Ynn X ←−−−−− broadcastpartmod −−−→ onprep −pobj−−→ Y X ←−−−−− broadcastpartmod −−−→ byprep −pobj−−→ Y X ←pobj−−− of ←−−− voiceprep −−→ Ynn X ←−−−−− broadcastpartmod −−−→ onprep −pobj−−→ Y X ←pobj−−− in←−−− channelprep −−→ Ynn X −−−→ ofprep −pobj−−→ tv−−→ Ynn
Table 7.8: Context-sensitive paraphrases for X ←−− appearsubj ←prep−−− on−−→ Ypobj on−−→ Y this time occurring in context feminist-X, dollar-Y in the followingpobj question:
(9) What American feminist appeared on a silver dollar?
The top paraphrases are dominated by the dollar context such as amount in dollar or side of dollar, none of which having the same meaning as appear on. Another example is the phrase shed blood, for which highest ranked patterns are expressions which often occur with blood. These are donate blood, vessel of blood or blood test, none of which are however similar in meaning with the phrase shed blood.
Finally, we also notice that the method is not robust to parsing errors. For example, the path you ←−− f lysubj −−→ pregnancy is obtained from an incor-obj rect parse of a QA question. When paraphrasing this, the model returns with average confidence scores, phrases such as collect from pregnancy, receive in
98 Summary
pregnancy or receive for pregnancy. In the future we plan to investigate the use of our model for detecting such erroneous parse paths. One way of approaching this is by using selectional preferences which can be easily induced from our framework; these can indicate highly unlikely contexts as being most probably generated by parsing errors.
Another aspect to be investigated concerns the amount of input corpus data used by these methods. Previous work shows that the performance of distri- butional methods can increase significantly with the size of input data. Our method in particular might benefit from this, as we attempt to learn a more complex model than traditional vector space methods. However, it is still an open question if it is possible to obtain an accurate model for context- sensitive paraphrase generation which uses solely distributional information and the DIRT-like representations in particular.