Mapping, composition and paraphrasing

4.3 Caption realisation

4.3.2 Mapping, composition and paraphrasing

Compositionality greatly simplifies the generation process by dividing it into two steps: first, the individual components are mapped to corresponding DMRS graph snippets, and subsequently these snippets are recursively combined. In addition, a paraphrasing step post-processes the resulting DMRS graph, before it is passed on to ACE/ERG to be turned into natural language. All of these steps make use of the DMRS graph description language I developed, GraphLang in the following, which will consequently be presented first.

GraphLang. The GraphLang formalism describes DMRS graphs in a serialised form, as a list of edge sequences plus connected nodes such that every edge of the graph appears exactly once. Nodes are represented by their predicate, e.g. square n 1, and associated variable with attributes (optionally abbreviated) in square brackets, e.g. x[3s + ]5_{, so the node for “square”}

is denoted by square n 1 x[3s + ]. The various edge relation types are represented by respective shortforms, for instance, =1=> for ARG1/EQ (e.g. a relation between adjective and noun), -2h-> for ARG2/H (e.g. a relation between verb and gerund object), or simply --> for RSTR/H, which is the special relation connecting quantifier and quantified instance node. Finally, graph-level index/top nodes are indicated by a leading * and **, respectively (often the ERG can infer one from the other). The sentence “A square is red.” as DMRS graph is thus written as follows:

a q --> _{square n 1 x[3s + ] <-1- * red a 1 e[ppi--]}

A node may need to appear multiple times in the linearised representation, for instance, if it is part of more than two edges. Instead of duplicating the node definition in such cases, it is referred to either by a leading colon plus predicate name (if unique), e.g. : square n 1, or by introducing a reference label, e.g. subj: square n 1, and referring to it via leading colon plus label instead. The sentence “A big square is red.” (“square” is part of three edges) can hence be written as follows:

a q --> subj: square n 1 x[3s + ] <-1- * red a 1 e[ppi--]; :subj <=1= big a 1 e[pui--]

5 _{Description of variable attributes here and in the following (for a full specification, see Kuhnle (2016)):}

x[3s + ]: 3rd person, singular, individuated; e[pui--]: proposition, untensed, indicative; e[ppi--]: proposition, present, indicative.

GraphLang supports a variety of advanced features to enable its use for subgraph matching, rewriting and querying. The full list of features can be found in the specification (Kuhnle, 2016) and additional example applications in the pydmrs paper (Copestake et al., 2016). Relevant concepts for the remainder of the section are underspecification and anchor nodes. First, the question mark symbol serves as a universal underspecified marker and can be used in a variety of places: for predicate slots, e.g. ? n 1 (any noun predicate); for variables, e.g. x? (instance variable with any person/number/etc); individual variable attributes, e.g. x[3??+?] (any 3rd person and individuated instance variable); or edges, e.g. -?-> (any relation type). The special values pred and node signal a fully underspecified predicate or node (predicate plus variable), respectively. Second, anchor nodes are defined via square-bracketed labels, e.g. [subj]: square n 1 x?. They can be explicitly referred to by, for instance, rewriting algorithms, and are useful in combination with underspecification, as illustrated in the following.

Mapping. A lookup table maps caption components based on their values to DMRS graph snippets, specified in GraphLang format. The following illustrates this mapping with one example per caption type taken from section 4.2:

• Attribute “red”:

[attr]: red a 1 e? =1=> [type]:node <-- [quant]:default q

• Object-type “shape”:

[type]: shape n 1 x?[pers=3] <-- [quant]:default q

• Relation “to the left of”:

left n of x?[num=sg] -1-> [ref]:node <-- [quant]: a q;

[rel]: to p e? -2-> : left n of <-- the q

• Selector “the bigger”:

[sel]: big a 1 e? =1=> [type]:pred x?[num=s] <-- [quant]: the q; more comp e? =1=> :sel

• Existential:

[quant]: a q --> [rstr]:pred x?[num=sg] <-1- [body]:node

• Quantifier “at most three”:

[quant]:udef q --> [rstr]:pred x?[num=pl] <-1- [body]:node;

:rstr <=1= card(3) e? <=1= at+most x deg e?

• Proposition “and”:

“A pentagon is above a green ellipse, and no blue shape is an ellipse.” ⇑ACE + ERG realisation⇑

⇑Component DMRS mapping⇑

Figure 4.5: A sentence, its associated DMRS graph with coloured components, and a simplified version of the corresponding ShapeWorld semantics, illustrating its compositionality.

Composition. The DMRS snippets obtained this way are iteratively composed. This merging process is guided by the anchor nodes, which act as the glue points for combining child with parent DMRS graphs. Partial underspecification of anchor nodes is resolved by adopting the more specific value, while other non-anchor nodes are simply copied. The unification of anchor predicates takes into account customised predicate hierarchies. For instance, default q subsumes all quantifier predicates, and shape n 1 may act as hypernym for concrete shape predicates like square n 1. Figure 4.5 illustrates the correspondence between caption component semantics and DMRS graph snippets as well as the compositional structure of the caption.

Matching and rewriting. In addition to composing DMRS graphs, anchor nodes also serve as reference points for DMRS subgraph rewriting. This is a two-step process, consisting of the identification of a subgraph and subsequent replacement by another, to obtain a modified DMRS graph. First, the subgraph S to be replaced has to be located in the DMRS graph G. This is achieved by subgraph matching, where nodes in S are associated with corresponding nodes in G such that all edges match as well. Matching also supports underspecification, as otherwise generic rewriting rules – like “a [colour] [shape]” to “a [shape] which is [colour]” – would require to enumerate all possible concrete instantiations. Finally, the identified subgraph S within G is transplanted and replaced by another subgraph S0 based on the correspondence of their anchor nodes which act as glue points, similar to their role during composition above.

Paraphrasing as subgraph rewriting constitutes the final step in the caption realisation pipeline of the ShapeWorld system, before the DMRS graph is turned into natural language. On the one hand, paraphrasing may be necessary to ‘fix’ certain technical inconsistencies between ShapeWorld and DMRS semantics, due to grammar-incompatible simplifications in the Shape-

World semantics. For instance, a sentence like “A square is red.” is internally produced as “A square is a red shape.”, due to the compositional caption system which pairs adjectives like “red” with the semantically empty “shape”. However, in English it is more common to collapse a sentence like “A square is a red shape.” to “A square is red.”, which can be adjusted by suitable paraphrasing rules. On the other hand, such rules can increase the linguistic variety of vocabulary and constructions by specifying semantically equivalent formulations as sub-graph alternatives, ranging from word-level synonyms like referring to “red object” instead of “red shape”, to phrase-level synonyms such as paraphrasing “most squares” as “more than half of the squares”, to clause-level equivalences like “a shape is red” and “there is a red shape”. Note that the current version of ShapeWorld does not implement instances of this second type of paraphrasing rules since, different from initial expectations, increasing linguistic variety turned out not to be necessary for sufficiently complex data to obtain interesting experimental results.

In document Evaluating visually grounded language capabilities using microworlds (Page 81-84)