CHAPTER 9 : Summary and Future Work
9.2 Discussion and Future Directions
This thesis has taken a noticeably distinct approach towards a few important problems in
the field and has shown progress on multiple ends. For example, the formalism of Chapter 3
and 4 are novel and provide general ways to formalize and implement reasoning algorithms.
The datasets of Chapter 6 and 7 are distinct from the many QA datasets in the field. The
theoretical analysis of Chapter 8 takes a uniquely distinct formal analysis of reasoning in
the context of natural language.
All these said, there are many issues that are not addressed as extensively as we could have
(or should have), or there are aspects that turned out slightly differently from what we initially expected.
Looking back at the reasoning formalism of Chapter 4, we underestimated the hardness
of extracting the underlying semantic representations. Even though the field has made
significant progress in low-level NLP tasks (like SRL or Coreference), such tasks still suffer
of such annotations, result in exponentially bigger errors when reasoning with them (as
also justified by the theoretical observations of Chapter 8); in practice, it worked well
only for short-ranged chains (1, 2, and sometimes 3 hops). With more recent progress in
unsupervised representations and improvement of semantic extraction systems, my hope is
to redo these ideas in the coming years and revisit the remaining challenges.
A vision that I would like to pursue (influenced by discussions with my advisor) is reasoning
with minimal data. We (humans) are able to perform the same reasoning on many high-
level concepts and are able to transfer them in all sorts of domains: for instance, an average
human uses the same inductive reasoning to conclude the sky is blue and inferring that
there is another number after every number. Effective (unsupervised) representation could
potentially need a huge amount of data (and many parameters), but successful reasoning
systems will likely need very minimal data (and very simple, but general definitions).
Over the past years, the field has witnessed a wave of activity on unsupervised language models (Peters et al., 2018; Devlin et al., 2018). There are many questions with respect
to the success of such models on several datasets: for instance, what kinds of reasoning
are they capable of? what is it that they are missing? And how we can address them by
possibly creating hybrid systems. What is clear is that these systems will offer increasingly
richer representations of meaning; we need better ways to effectively understand what these
systems are capable of and what are the scenarios they are used to represent. And in
conjunction to understanding their capabilities and limitations, we have to build reasoning
algorithms on top of them. It’s unlikely that these tools will ever be enough to solve all of
our challenges; one has to equip these representations with the ability to reason, especially when they face an unusual/unseen scenario.
In Chapter 5 (essential terms) an initial motivation was to model knowing what we don’t
know (Rajpurkar et al., 2018); basically, systems should be able to infer whether they
have enough confidence about the answer to a given query before acting. In hindsight, I
up generalizing to tricky instances. Additionally, it would have been better if the decision
of essentiality was more involved within reasoning systems (rather than an independently
supervised classifier, which limited its domain transfer).
The datasets of Chapter 6 and 7 are critical parts of this thesis which, I suspect, are likely
to be remembered longer than the rest of the chapters. In general, the construction of datasets (including the ones we described) is a menial task. It’s unfortunate that many
small empirical details are usually left out. It is not clear to me whether using static
datasets is the best way for the road ahead. In the future, I hope that the field discovers
more effective ways of measuring the progress towards NLU.
A key issue contributing to the complexity of NLU (and Question Answering) is the set
of implied information (common sense). We touch upon a class of such understanding in
Chapter 7, where we introduce a dataset for such problems. A natural next step is addressing
such questions and exploring the many ways we can incorporate such understanding in the models.
The analysis of Chapter 8 is uniquely distinct within the field. That said, there are many
issues that make me feel unsatisfied about our current attempt. In particular, there are many
assumptions that may or may not stand the test of time (e.g., the generative construction of
symbol graph from the meaning graph or the connectivity reasoning as a proxy for the actual
reasoning in language). And there are some important reasoning phenomena missing from
this formalism: conditional reasoning, transitivity and directionality, inductive reasoning,
just to name a few. In general, our (the field’s) understanding of “reasoning” (and its formalisms) is very limited. And the existing formalisms are not easily applicable, since
those who formalized reasoning were not intimately aware of the complexity of NLU; they
were philosophers and mathematicians. In practice, it’s really hard to make the existing
theories of reasoning work in the existence of many of the properties of language. In the
coming years, I would like to see more efforts on reconciling the issues in the interface of
APPENDIX
A.1. Supplementary Details for Chapter 3