Chunking models - Semantic chunking

After establishing the theoretical foundations of the task in Chapter 2, we proceeded to investigate how semantic chunking and its underlying principles can be implemented in practice. The key contribution of the core chapters is the introduction of three chunking approaches: two dedicated to chunking DMRS representations, and one which models surface semantic chunks directly based on semi-supervised training data.

The chunking system described in Chapter 3 performs semantic chunking on DMRS graphs following a set of hand-crafted rules which target a selected range of grammatical constructions. Its focus was to demonstrate the viability and benefits of semantic chunking as a pre-processing step for realization from DMRS. In the *MRS framework the operation can be thought of as the inverse of parsing and it requires a well-formed input. With that in mind, we prioritised quality of chunks over their quantity and restricted their form to finite clauses, which are readily processed by the generator (§3.1). The rules (§3.2) reflect grammatical constructions which combine multiple such clauses and which can be reliably identified locally in DMRS graphs: clausal coordination, subordinating conjunctions, and clausal complements of verbs.

Another reason behind the focus on the quality of semantic chunking over wide cover- age was the secondary objective of the proof-of-concept chunker, which is related to the second chunking model introduced in the thesis. Chapter 4 describes how DMRS semantic chunks can be used to train a surface-based chunking model. Using the connection between DMRS graphs and the strings from which they were created, we successfully converted semantic chunks defined on DMRS into surface fragments suitable to act as surface semantic chunks (§4.1). Guided by the state-of-the-art approaches to similar tasks, we trained a semi-supervised sequence labelling model capable of chunking previously unseen examples without access to their semantic representations (§4.3). The chunker reached the F-score of 0.862, below but comparable to the scores for established sequence tagging tasks such as NER or shallow chunking. The size of targeted fragments, complex nature of captured dependencies, and automatically created dataset make the task more challenging than some of the existing ones, but taking into account the lack of in-depth hyperparameter optimization, the score suggests that the sequence labelling approach is suitable for the task.

The hand-crafted rules of the prototype model covered only a small range of phenomena discussed theoretically in Chapter 2 as potential bases for valid semantic chunks. This resulted in rare chunking opportunities and long, sub-optimally complex chunks. Our experiments exposed other brittle aspects of the rule-based approach, such as difficulties capturing detailed interactions between constructions (§3.5). Although sufficient to demonstrate the benefits of chunking, the human-centred design stood at odds with the flexible philosophy of the

6.2 Chunking models 141

task. In Chapter 5 we set out to explore to what extent the restrictions on the form of chunks can be loosened without reducing the performance gains and the quality of results (cf. §6.3; §§3.6.3, 5.5). The new chunking model steps away from prescriptive rules and instead relies on suitable constituents emerging naturally from the grammar principles reflected in the DMRS structure. The key feature behind its operation is the scopal hierarchy of DMRS, providing a tree-like organisation of subgraphs corresponding to individual situations. In §5.1 we demonstrate how a simple set of constraints on the form of chunks leads to coherent chunking decisions in a multitude of grammatically complex scenarios. For example, the expanded catalogue of semantic chunks includes fragments with shared entities, as in the case of VP coordination.

The result is an automated chunking model capable of finding chunking opportunities based on previously unseen DMRS structures. The human input is limited to defining constraints on what constitutes a valid semantic chunk for the target downstream application. Some of them are defined a priori, but the results can be further refined by applying filtering criteria to the chunking opportunities discovered by the model. The patterns in chunking decisions can be generalized in the form of templates (§5.3), i.e. underspecified DMRS graphs which summarise chunking decisions by abstracting them from irrelevant lexical details. Template-based generalisations can also open up new ways of processing chunked sentences, as we demonstrate with surface templates for realization (§5.4.4; cf. §6.3).

6.2.1 Further research on chunking models

One potential improvement to the models presented in this thesis has already been suggested in §5.7. The scopal framework of the chunker from Chapter 5.3 can be readily expanded to include non-scopal chunking opportunities, such as relative clauses.

Throughout the thesis, we pointed out granularity as one of the parameters determining the suitable form of chunks for a given task. We observed that the long and complex semantic chunks produced by the rule-based chunker could be behind a lower performance of sequence labelling on semantic chunking when compared with other similarly framed tasks. On the other hand, the scope-based system of Chapter 5 was designed to explore the boundaries of what constitutes a valid chunk under minimal assumptions and does not impose direct limits on the chunk size. It would be interesting to investigate in-depth and quantitatively how the size of semantic chunks affects the performance and quality of the results of downstream tasks, and whether it is possible to establish a principled optimal threshold. It would have to determined on a per target task basis and take into account its particular implementation, as the optimal chunk size is likely different for a chart generator, e.g. ACE, than for a neural model, e.g. Hajdik et al. (2019).

Just as the DMRS chunking model from Chapter 3 was the proof-of-concept for the task in general, the surface chunking model from Chapter 4 aimed to show that it is possible to model surface semantic chunks directly from sentence strings. Although our approach was guided by state-of-the-art models successful on other sequence labelling tasks, we did not focus on in-depth optimization. We find it likely that the task score could be raised further by careful hyperparameter optimization and tweaks in the details of the model architecture. The recent developments of contextual word embeddings, such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2019), further pushed the limits of benchmark tasks such as NER, and it would be interesting to explore how they interact with the semantically oriented objective of semantic chunking.

As much as surface semantic chunking resembles other sequence labelling problems, it has some unique features which were not addressed in our experiments. In particular, semantic chunks are intrinsically nested structures and because of that, chunking decisions can be thought of as segmentations. This characteristic is particularly exposed by the scope- based model, where each new chunk is the result of deciding whether a satellite clause meets the criteria for a valid chunk. If we were to impose a minimum threshold on the size of chunks, some large fragments comprising multiple clauses would remain whole as large semantic chunks, even though a lower threshold could lead to their division into smaller chunks. The problem of nested sequence labelling has drawn less attention in the general sequence labelling research because of the common flat formulations of sequence labelling benchmark tasks. Nested NER (Finkel and Manning, 2009) is, however, a busy research area with a variety of actively investigated approaches to the problem. Some of the recent ones include linearization (Straková et al., 2019) and stacking of multiple flat models (Ju et al., 2018). The nested nature of semantic chunks would make the task another target of these architectures.

In a natural next step we would like to investigate how well the more varied DMRS chunks of the scopal system translate to surface chunks and what constraints are suitable for the target task of parsing (cf. §6.3.1). The limited set of hand-crafted rules behind the original model allowed a straightforward conversion into a labelling scheme. The wider variety of modelled constructions associated with the new chunker would require a more sophisticated set of tags. Ideally, we would be able to recover the information about which template corresponds to each chunking decision, so that the target task could benefit from the template structure when combining partial chunk results into the full analysis. We are confident that the general nature of templates lends itself to conversion into an informative labelling scheme based partially on nonterminals describing each chunk. The more complex distinctions could be modelled by adding a complementary classifier layer on top of the

In document Semantic chunking (Page 152-155)