Discussion and Future Directions - : Summary and Future Work

CHAPTER 9 : Summary and Future Work

9.2 Discussion and Future Directions

This thesis has taken a noticeably distinct approach towards a few important problems in

the field and has shown progress on multiple ends. For example, the formalism of Chapter 3

and 4 are novel and provide general ways to formalize and implement reasoning algorithms.

The datasets of Chapter 6 and 7 are distinct from the many QA datasets in the field. The

theoretical analysis of Chapter 8 takes a uniquely distinct formal analysis of reasoning in

the context of natural language.

All these said, there are many issues that are not addressed as extensively as we could have

(or should have), or there are aspects that turned out slightly differently from what we initially expected.

Looking back at the reasoning formalism of Chapter 4, we underestimated the hardness

of extracting the underlying semantic representations. Even though the field has made

significant progress in low-level NLP tasks (like SRL or Coreference), such tasks still suffer

of such annotations, result in exponentially bigger errors when reasoning with them (as

also justified by the theoretical observations of Chapter 8); in practice, it worked well

only for short-ranged chains (1, 2, and sometimes 3 hops). With more recent progress in

unsupervised representations and improvement of semantic extraction systems, my hope is

to redo these ideas in the coming years and revisit the remaining challenges.

A vision that I would like to pursue (influenced by discussions with my advisor) is reasoning

with minimal data. We (humans) are able to perform the same reasoning on many high-

level concepts and are able to transfer them in all sorts of domains: for instance, an average

human uses the same inductive reasoning to conclude the sky is blue and inferring that

there is another number after every number. Effective (unsupervised) representation could

potentially need a huge amount of data (and many parameters), but successful reasoning

systems will likely need very minimal data (and very simple, but general definitions).

Over the past years, the field has witnessed a wave of activity on unsupervised language models (Peters et al., 2018; Devlin et al., 2018). There are many questions with respect

to the success of such models on several datasets: for instance, what kinds of reasoning

are they capable of? what is it that they are missing? And how we can address them by

possibly creating hybrid systems. What is clear is that these systems will offer increasingly

richer representations of meaning; we need better ways to effectively understand what these

systems are capable of and what are the scenarios they are used to represent. And in

conjunction to understanding their capabilities and limitations, we have to build reasoning

algorithms on top of them. It’s unlikely that these tools will ever be enough to solve all of

our challenges; one has to equip these representations with the ability to reason, especially when they face an unusual/unseen scenario.

In Chapter 5 (essential terms) an initial motivation was to model knowing what we don’t

know (Rajpurkar et al., 2018); basically, systems should be able to infer whether they

have enough confidence about the answer to a given query before acting. In hindsight, I

up generalizing to tricky instances. Additionally, it would have been better if the decision

of essentiality was more involved within reasoning systems (rather than an independently

supervised classifier, which limited its domain transfer).

The datasets of Chapter 6 and 7 are critical parts of this thesis which, I suspect, are likely

to be remembered longer than the rest of the chapters. In general, the construction of datasets (including the ones we described) is a menial task. It’s unfortunate that many

small empirical details are usually left out. It is not clear to me whether using static

datasets is the best way for the road ahead. In the future, I hope that the field discovers

more effective ways of measuring the progress towards NLU.

A key issue contributing to the complexity of NLU (and Question Answering) is the set

of implied information (common sense). We touch upon a class of such understanding in

Chapter 7, where we introduce a dataset for such problems. A natural next step is addressing

such questions and exploring the many ways we can incorporate such understanding in the models.

The analysis of Chapter 8 is uniquely distinct within the field. That said, there are many

issues that make me feel unsatisfied about our current attempt. In particular, there are many

assumptions that may or may not stand the test of time (e.g., the generative construction of

symbol graph from the meaning graph or the connectivity reasoning as a proxy for the actual

reasoning in language). And there are some important reasoning phenomena missing from

this formalism: conditional reasoning, transitivity and directionality, inductive reasoning,

just to name a few. In general, our (the field’s) understanding of “reasoning” (and its formalisms) is very limited. And the existing formalisms are not easily applicable, since

those who formalized reasoning were not intimately aware of the complexity of NLU; they

were philosophers and mathematicians. In practice, it’s really hard to make the existing

theories of reasoning work in the existence of many of the properties of language. In the

coming years, I would like to see more efforts on reconciling the issues in the interface of

APPENDIX

A.1. Supplementary Details for Chapter 3

In document Reasoning-Driven Question-Answering For Natural Language Understanding (Page 158-161)