Part I Selecting a Meaning Representation
5.5 Current Data Splits Only Partially Probe Generalizability
6.6.2 Schema Maps
Table6.4compares a model that pays attention to input and schema embeddings with one that pays attention to input, schema embeddings, and a schema map. The model without the schema map is more accurate. The differences are small, but the consistency of their direction indicates that use of a schema map hinders performance.
One would expect the schema map to have either a positive effect or none. After all, if it does not provide useful information, the model should learn to ignore its signal. The problem may be that the schema map makes the model larger, and the amount of training data we have cannot support a larger model.
6.7
Conclusions and Future Work
The proposed representations of schema and architectures that incorporate those represen- tations have not proven helpful for the text-to-SQL task. However, the observation that SQL queries depend on the database schema remains true. Thus, additional work on incor-
porating schema information into neural networks for text-to-SQL could still be fruitful. One option is to make changes to the architecture. Perhaps the attention-to-schema model could be improved with some form of gating. Or perhaps adjusting the structure to more closely resemble a pointer network would help.
The problem might lie with the schema representation. We use word embeddings to build these representations, but there is no reason to believe that these embeddings occupy the same space as the embeddings the model learns for the encoder or decoder vocabularies. We might address this by initializing all three with word embeddings from the same space and changing the schema embeddings from their current, static state to dynamically learned embeddings like those of the encoder.
None of these changes address the larger problem, however: text-to-SQL models are not learning to generalize. Within domains, models are failing to generalize to previously unseen queries. Across domains, models are not able to use information gleaned about SQL from one dataset to improve their performance on another dataset.
The most important future work will need to fix this. Encoding more knowledge of SQL into the network could help. This might be done through pre-training the parameters of the decoder as a SQL language model, perhaps on a dataset like that ofIyer et al. (2016). Such a model, it is to be hoped, would learn general rules of SQL, which could then be honed for particular domains. Alternatively (or additionally), we could explicitly provide the network with information about the structure of SQL.Zhong et al. (2017)’s seq2SQL architecture andDong and Lapata (2016)’s seq2tree architecture are examples of this type of idea; however, to date no one has built an architecture that is both specific to SQL and general enough to cover a variety of queries. Other changes to both architecture and input data should focus on how to train the network to recognize and use compositionality. These are the areas most likely to generate true breakthroughs, rather than incremental improvements over the state of the art.
CHAPTER 7
Text-to-SQL in Dialog Context
7.1
Introduction
Text-to-SQL work to date has focused on transforming a single question—such as “Who teaches Discrete Mathematics?”—to a query (Popescu et al., 2003; Popescu et al., 2004; Giordani and Moschitti, 2012;Poon, 2013;Li and Jagadish, 2014;Saha et al., 2016;Zhong et al., 2017; Iyer et al., 2017; Cai et al., 2017). One potential use for parsing English to SQL is as a component of a dialog system. In a dialog system, an agent typically holds a multi-turn conversation with a user. Thus, the meaning of a given question may be affected by the conversational context.
An obvious example is coreference. Suppose a student and an advisor are having a conversation about what courses the student should take the following semester. Their con- versation might include the following exchange:
ADVISOR: You should consider taking Discrete Mathematics. STUDENT: Who teaches that class?
Notice that the student’s question is semantically equivalent to “Who teaches Discrete Mathematics” when viewed in the context of the conversation; it can be answered using the same SQL query. However, the single utterance “Who teaches that class?” does not contain enough information to generate the SQL.
following interaction:
STUDENT: I’m looking for an easy class. ADVISOR: Have you considered EECS 484? STUDENT: When does that meet?
ADVISORS: Monday mornings.
STUDENT: I can’t do morning classes. Can you suggest something else?
“Can you suggest something else?” does not make its constraints explicit, but a human would know that the student wants easy classes that do not meet in the afternoon. A dialog system would need to maintain some representation of the conversation’s state to remember that it needs to generate an easiness constraint.
In this chapter, we therefore take the first steps towards transforming the one-to-one text-to-SQL systems of Chapters5and6into text-to-SQL components for dialog systems. We describe the Flex-to-SQL dataset, version 0.1. We report the performance baseline systems on this dataset. And we describe future work that is promising for this task.