7.2 Parsing the Microtext corpus
7.2.1 Local models
In order to perform structured output prediction on argumentation structures, ideally what one would like to do is to learn a model
h : XAn 7→ YG
where XAn is the domain of instances representing a collection of adus for each
dialogue and YG is the set of all possible argumentation graphs. Directly predict-
ing argumentation structures, though, is a very difficult task which requires an amount of data that we currently lack in the community since, in a sense, every document is considered as a single instance. Moreover no appropriate logistic or hinge loss function (Smith, 2011) has been proposed in the community either for argumentation or discourse structures. Most approaches, including our novel ILP approach, aim thus at the more modest goal of learning a model
h : XA2 7→ YR
where the domain of instances XA2 represents features for a pair of adus and YR
building a local sort of model that yields a probability distribution of relations between individual adus.
Note that we do not directly make a classifier out of this model. In other words, we do not try to directly extract relations from the above model by searching for a threshold that will have optimal local results. Indeed, concatenating the relations predicted by a local classifier would not necessarily yield a well-formed structure, even with good local results; there would be no guarantee that there would be no cycles or a single connected component, as required by our data. Instead we use the probability distribution that this model yields as input to a decoder that tries to optimize a global measure of the argumentation structure.
Dependency Structures We use the dependency conversion of the argumen-
tative portion of the corpus, as presented in section 7.1.4, with the coarse grained set of relations {support, attack}.
For illustration, figure 7.8 shows the dependency graph for the argumentation structure of the following example.
(7.1) [Health insurance companies should naturally cover alternative medical treatments.]1[ Not all practices and approaches that are lumped together
under this term may have been proven in clinical trials,]2[ yet it’s precisely
their positive effect when accompanying conventional ’western’ medical therapies that’s been demonstrated as beneficial.]3[ Besides many general
practitioners offer such counselling and treatments in parallel anyway -]4[
and who would want to question their broad expertise?]5
1 2 3 4 5
attack attack support
support
Figure 7.8: Dependency conversion of the argumentation structure of example 7.1
Subtasks Peldszus and Stede (2015) proposed the following four subtasks for predicting the argumentation structures:
• attachment (at): Given a pair of adus, are they connected by an argu- mentative relation? [yes, no]
• central claim (cc): Given an adu, is it the central claim of the text? [yes, no]
• role (ro): Given an adu, is it in the [proponent]’s or the [opponent]’s voice? • function (fu): Given an adu, what is its argumentative function? [support,
attack, none]
We reproduced this approach and trained a log-loss SGD (stochastic gradient descent) classifiers for each of these tasks. Note, that relation labels are classified using only the source segment. We reimplemented their feature set, which includes lemma uni- and bigrams, the first three lemmas of each segment, POS-tags, lemma- and POS-tag-based dependency parse triples, discourse connectives, main verb of the sentence, and all verbs in the segments, absolute and relative segment position, length and punctuation counts, linear order and distance between segment pairs.
For the syntactic analysis, we use the spaCy parser (Honnibal and Johnson, 2015) instead of the mate parser (Bohnet, 2010). Both parsers provide pretrained models for English and German. The spaCy parser is a bit less accurate and does not offer a morphological tagging, but it is very fast and allows us to greatly simplify the pipeline. Moreover, it comes with Brown clusters and vector-space representations, which we want to test. Another difference is that we extended the lexicon of English discourse connectives with the connectives collected in the EDUCE project.10
New features In addition to the reimplemented feature set, we test the impact of the following new features: We add Brown cluster unigrams (BC) and bigrams (BC2) of words occurring in the segment. We completed the discourse relations features (DR): While the lexicon of discourse connectives for German used in ex- periments of Peldszus and Stede (2015) was annotated with potentially signaled discourse relations, their English lexicon was lacking this information. We ex- tended the English connective lexicon by those collected in the EDUCE project which also have been annotated with signaled discourse relations. Also, a fea- ture representing the main verb of the segment was added; the already existing verb features either focused on the verb of the whole sentence which might be too restrictive, or on all possible verbal forms in the segment which might not be restrictive enough.
In order to investigate the impact of word embeddings for this task, we add the 300 dimensional word-vector representations, averaged over all content words of the segment, as a feature for segment wise classifiers (VEC). Stab and Gurevych (2016) gained small improvements –around 1 point F1-score on their dataset– by adding word-embeddings as a features to their argumentative stance classifier. Moreover, we derive scores of semantic distance between two segments using these vectors: We measure the cosine distance between the average word vector representations
of the segment and its left and right antecedents (VLR). Also, for the attachment classifier, we measure the cosine distance between the average word vectors of the source and target segment (VST).
Furthermore, we added features for better capturing the inter-sentential struc- ture, i.e. for relations with subordinate clauses: One feature representing that the source and target segments are part of the same sentence (SS) and one representing that the target is the matrix clause of the source (MC).