Semi-supervised learning and related work

6. Semi-supervised training set expansion with structural alignment

6.2. Semi-supervised learning and related work

In supervised machine learning (classification), we assume the existence of some training data, a set of examples X with assigned classes or labels Y , from which we can learn a function f : X → Y (a model) that maps the examples to the correct labels. Conversely, in unsupervised machine learning, we do not have label information Y for the examples. Instead of learning a mapping from X to Y , we can only group the examples into categories according to the internal structure of the data. Semi-supervised or weakly supervised learning lies in between supervised and unsupervised learning. In this setting, while we have some information about the labels on the training data, other information may be missing. Semi-supervised learning attempts to leverage unlabeled data in combination with the labeled data to find the missing information. The basic case of semi-supervised learning has labels for only part of the training data, but there are other forms of semi-supervised learning that for example only have partial or noisy label information or use a set of constraints as basis for labels. For a discussion of semi-supervised learning in computational linguistics see Abney (2007).

A frequently used method of semi-supervised learning is self-training or bootstrapping. Typically, self-training starts with a small set of labeled training examples on which a supervised classifier (the base learner ) is trained. This classifier is then applied to a larger set of unlabeled examples and assigns a label and a confidence score to each example. The examples where the classifier is most confident are then added to the training data. A new classifier is trained on the expanded training data, applied to the unlabeled data and again the most confident examples are added to the training data. The process is iterated until some stopping criterion is reached, for example be a fixed number of iterations or until convergence, i.e., the classifier or the labels of the data do not change from one iteration to the next. One well-known example of a bootstrapping approach used in NLP is the Yarowsky algorithm (Yarowsky, 1995). There are many variants of this basic method depending on the exact implementation of each step.

Semi-supervised approaches have been used in many fields and Semantic Role Labeling is no exception. Already Gildea and Jurafsky (2002), the first work to tackle SRL as an independent task, use bootstrapping to enlarge their training data. Their final expanded training data is about six times the size of the originally annotated training data and they report a small improvement for training their system on the expanded versus training on the non-expanded training data.

Other approaches use the extensive resources with information about predicates and possible arguments that exist for SRL as a basis for bootstrapping. The work of Swier

6.2. Semi-supervised learning and related work 125

and Stevenson (2004, 2005) leverages VerbNet as the basis for a bootstrapping approach to classify argument roles. VerbNet lists possible argument structures allowable for each predicate. For a given argument, they determine the set of possible roles from VerbNet. After initially making all unambiguous role assignments, their system learns from these assignments and iteratively proceeds to label all arguments.

Apart from bootstrapping, label projection is another common method to assign labels to unlabeled data. Label projection starts with an individual labeled seed example, looks for similar examples, and then projects the labels over some alignment between the two examples. Label projection has been used in multilingual contexts for various tasks, from projecting parts-of-speech and chunks (Yarowsky et al., 2001) to Named Entities (Ehrmann et al., 2011) and also to project SRL information from one language to another (Padó and Lapata, 2009). In multilingual contexts, the alignments that are used for the projection are derived from the translation alignment between the words of the original sentence and its translation. For monolingual contexts a different sort of alignment with different measures of similarity is necessary. Such monolingual alignments are studied for in textual entailment recognition or paraphrase identification, although for these tasks there is usually no transfer of labels along the alignments (Yao et al., 2013).

The approach we are adopting for our work in this chapter is structural alignment proposed by Fürstenau and Lapata (2009, 2012). While they also use a bootstrapping approach in their evaluation, their proposed method is a label projection algorithm over sentences. The process is based on the assumption that sentences which are similar in syntactic structure and semantics of the arguments also have similar predicate-argument relations. For each labeled seed sentence, the k most similar sentences are extracted from a large set of unlabeled sentences and the labels of the seed sentence are projected. Structural alignment is explained in detail in the following section together with our adaptations. Fürstenau and Lapata (2009, 2012) address the complete pipeline of SRL steps, from predicate identification until argument classification.

A similar approach is presented also by Franco-Penya and Emms (2012), but their approach relies on already identified predicates and arguments and is thus not applicable to our task, where we do not have such information. For an unlabeled sentence, they project the labels for all arguments from the labeled sentence with the smallest tree edit distance and experiment with different cost measures. In contrast to the above approaches, they do not use their method to expand a training set for classification, but directly classify the unlabeled sentences (similar to 1-Nearest-Neighbor classification).

Some completely unsupervised approaches have also been proposed for tasks in the SRL pipeline. Abend et al. (2009) do unsupervised argument identification by using

126 6. Semi-supervised training set expansion with structural alignment

pointwise mutual information to determine which constituents are the most probable arguments. Initially, all constituents are regarded as argument candidates, this set is then filtered (by using minimal clauses, pruning and pointwise mutual information) to include only the most probable candidates. Their work is limited to argument identification and they use gold predicates as a starting point. In contrast to their work, we cannot start from given predicates in the test set, as we have no annotations at all for our unlabeled sentences.

For comparison detection we do not have extensive resources like PropBank or VerbNet at our disposal. We do however think that a small seed set of comparison sentences can be annotated in reasonable time for any new domain or language. This set may not be sufficiently large for bootstrapping, but it can be used as an initial seed set for a label projection approach like structural alignment.

In document Structurally informed methods for improved sentiment analysis (Page 138-140)