Templates - Semantic chunking

Thanks to the chunking algorithm described in the previous section, we can divide any DMRS into semantic chunks, even based on structures we have never observed before. The

5.3 Templates 111

Fig. 5.7 Schematic structure of the scopal tree in Configuration 2. Dashed line indicates that r is not necessarily the immediate child of n. In Configuration 3, n is the centre of the nucleus chunk instead.

resulting chunks represent a wider variety of grammatical constructions and take more varied forms compared with the chunks from Chapter 3. This flexibility fits well with the task characteristics, but in order to inspect the chunks and to impose constraints specific to the intended application of chunks, we need a way of generalizing across our examples. The chunking system from Chapter 3 solved the issue by starting with a set of rules, strongly limiting what chunks can be formed. For the new model we propose a different approach that better suits the adaptable nature of chunking.

The prescriptive rules of the prototype system are replaced in the new model by descriptive templates extracted from a training set. A template is a small underspecified DMRS based on a functional graph associated with a chunking decision. It captures relationships between interacting chunks. Individual rules from §3.2 find their equivalents among the automatically extracted templates (§5.4.3), but while the original rules were hard-coded generalisations, templates emerge naturally from the basic principles of the representation and the grammar. We demonstrate the benefits of the new paradigm in §5.4.4, where the automatically extracted patterns generalizing across the dataset help us address the issue of how to represent functional subgraphs in realization with chunking (§5.4).

Templates are directly linked to functional graphs. Each functional graph stores nodes and links which do not belong to either the nucleus or satellite semantic chunks. In order to store the links, it also includes copies of nodes which participate in the connection from within the chunks. These nodes provide constraints on what type of DMRS subgraphs can participate in the particular chunking pattern. For example, the functional graph and two chunk subgraphs from Figure 5.8 are the result of chunking Example 95 based on the clausal coordination:

(95) Sam complained because there were no potatoes, but he made supper anyway. The functional graph would take a different form if the left coordinate was a simple clause. At the same time, chunking other sentences with a similar coordination would yield functional

(a) The left coordinate chunk of Example 95: Sam complained because there were no potatoes.

(b) The right coordinate chunk of Example 95: he made supper anyway.

Fig. 5.8 The chunks and the functional graph for Example 95

5.3 Templates 113

graphs that are almost identical, with the exception of properties of nodes, such as their lemmas or tenses.

This observation allows us to generalise across chunking decisions and find reliable patterns. Each functional graph can be converted into a delexicalised template, shared between chunking points based on similar DMRS structures. To create a template, we convert each node in a functional graph into a delexicalised nonterminal. A potential template for Example 95 is shown in Figure 5.9.

The type of nonterminals can be adjusted to match the level of generalization allowed by the target task and input representation. Our application of choice is realization from DMRS, and in §5.4.2 we give an example of how the nonterminals can be tailored to the needs of that particular task and how they reflect the constraints imposed on the form of semantic chunks (§5.2).

In §5.4.3 we present the templates extracted from our training set. The scope-based chunker found 2056 chunking opportunities, which can be represented by 789 templates, some of which have an almost one-to-one correspondence to the rules from the prototype model. The generalising power of the template approach is demonstrated by the fact that 303 of the extracted templates account for over 70% of the chunking decisions encountered in the test set.

5.3.1 Related work: HSST/R

Expressing generalizations across chunking decisions in the form of templates was inspired by the synchronous context-free grammars (SCFG) used in hierarchical phrase-based translation systems (§2.2.6), in particular, Hierarchical Semantic Statistical Translation (HSST) and Realization (HSSR) (Horvat, 2017; Horvat et al., 2015). HSST is a hybrid translation system, combining Hiero (Chiang, 2005, 2007) (§2.2.6) with information from semantic parses. It adapts the SCFG to comprise graph-to-string rules, using the same semantic representation and grammar as the work in this thesis – Dependency Minimal Recursion Semantics (DMRS; §2.3). HSSR (Horvat et al., 2015) is an adaptation of HSST to statistical realization.

The HSST grammar rules connect DMRS subgraphs of a sentence in the source language to surface fragments of the target sentence, rather than linking two surface sentences directly. The initial set of source rules is extracted from suitable subgraphs of the input graph. Each rule is then iteratively subtracted from other rules wherever possible to create non-terminal rules. The process is similar to the one we use for chunking, where semantic chunks lower in the hierarchy are removed from their source graphs and processed recursively. Each non-terminal in a template stands in for either a simple chunk or another complex hierarchy.

Fig. 5.10 An example of a terminal (top) and non-terminal (bottom) realization rule for HSSR (Horvat, 2017).

Non-terminals in the rule subgraphs have matching non-terminals in the surface side of the rule. Figure 5.10 shows examples of rules with and without nonterminal symbols.

The graph sources of the SCFG rules are a variant of semantic subgraphs. Horvat (2017) originated the term but defined it through the extraction algorithm. The resulting subgraphs are the precursors of the entities we discuss more conceptually in Section 2.3.4. The HSST/R rules are based on DMRS graphs simplified to polytrees, i.e. directed graphs which resemble trees but have multiple root nodes. Instead, we extend our use of the term "semantic graphs" to general graphs, changing the details of the definition but preserving the intuition behind it.

In document Semantic chunking (Page 122-126)