Classification of Natural Language Interfaces to Databases based on the Architectures

(1)

Available Online at http://www.journalofcomputerscience.com/

Classification of Natural Language Interfaces to Databases based on the

Architectures

Abstract

Natural Language Interface to Database (NLITDB) system is an interface to a database where an user submits his/her request to retrieve some information from a database in natural language like English. A NLITDB system accepts questions in natural language and generates results. Generally, users have to learn a Query Language such as Structured Query Language (SQL) to formulate a query and to retrieve information from a database. Learning a Query Language such as Structured Query Language (SQL) is difficult for many non-technical database users. A solution for this problem is to make use of NLITDB to retrieve information from database. Nowadays, the importance of NLITDB system is gained because of the increasing interaction of non-technical users with databases. Many NLITDB systems were developed since 1960’s. Each NLITDB system used an architecture to process the natural language question submitted by the user. In this paper, We classify and review the existing NLITDB systems based on the architectures adopted by them. However, the outcomes of this classification are dedicated to the researchers in NILITDB systems to know which architectures were used more in the development of NLITDB systems.

Keywords- Databases, Natural Language Interface to Database (NLITDB), Architecture, Structured Query language (SQL).

S.AQUTER BABU Asst. Professor

Dept. of Computer Science Dravidian University Kuppam, India

[email protected]

D. MABUNI Asst. Professor

Dept. of Computer Science Dravidian University Kuppam, India [email protected]

Prof. C. LOKANATHA REDDY Professor

Dept. of Computer Science Dravidian University Kuppam, India

(2)

1. Introduction

One of the main characteristics of a Database Management System (DBMS) is to allow users to create and maintain a Database. Database is an organized collection of logically related data. Nowadays, many non-technical people also are interacting with databases. DBMSs provide a query language such as Structured Query language (SQL) for the users to formulate queries and to retrieve information from a database. It is difficult for the non-technical people to formulate a query in query language such as SQL to retrieve information from a database because of the lack of knowledge about database structure, SQL syntax etc.

Natural Language Interface to Database (NLITDB) systems were developed since 1960’s to solve the problem of formulating queries in query language such as SQL to retrieve the information from a database. NLITDB systems allow the users to submit their request to retrieve information from the database in natural language such as English. NLITDB system accepts questions in natural language and these user questions are translated into a query language such as SQL, which are processed by the DBMS to retrieve the answers.

In this paper, We classify and review the existing NLITDB systems based on the architectures adopted by them. However, the outcomes of this classification are dedicated to the researchers in NILITDB systems to know which architectures were used more in the development of NLITDB systems.

The rest of the paper is organized as follows: section 2 presents an overview of different architectures adopted by many NLITDB systems. Section 3 discusses the classification of NLITDB systems based on the architectures adopted by them.

Section 4 presents the results and Finally, section 5 concludes the paper.

(3)

2. Types of Architectures

The following four types of Architectures were used in the development of many NLITDB systems [1]. Each architecture in NLITDB system reflects different choices of what information is to be applied and in what manner.

• Pattern-Matching systems

• Syntax-based systems

• Semantic Grammar systems

• Intermediate Representation Languages (IRL)

2.1) Pattern-Matching systems

Some of the early NLITDB systems relied on pattern-matching techniques to answer the user's questions. To illustrate a simplistic pattern-matching approach, consider a database table holding information about countries:

Countries table

Country Capital Language --- --- --- France Paris French Italy Rome Italian . . . . . . . . .

A primitive pattern-matching system could use rules like:

pattern: . . . ``capital'' . . . <country>

action : Report Capital of row where Country = <country>

The above rule says that if a user's request contains the word ``capital'' followed by a country name (i.e. a name appearing in the Country column), then the system should locate the row which contains the country name, and print the corresponding capital.

(4)

If, for example, the user typed ``What is the capital of Italy?'', the system would use the above pattern rule, and report ``Rome''. The same rule would allow the system to handle

``Print the capital of Italy.'', ``Could you please tell me what is the capital of Italy?'', etc. In all cases the same response would have been generated.

The main advantage of the pattern-matching approach is its simplicity: no elaborate

parsing and interpretation modules are needed, and the systems are easy to implement.

The pattern-matching architecture was used in one of the NLITDB systems SAVVY.

2.2) Syntax-based systems

Syntax based systems are based on the idea of extending syntactic parsers with semantic labels. A sentence is parsed using certain grammar rules resulting in a syntactic tree, some of the nodes in the tree are then mapped to their semantic meaning, and these semantic meanings are further combined to produce the corresponding database query in database query language such as SQL.

The main advantage of using syntax based approaches is that they provide detailed information about the structure of a sentence. A parse tree contains a lot of information about the sentence structure; starting from a single word and its part of speech, how words can be grouped together to form a phrase, how phrases can be grouped together to form more complex phrases, until a complete sentence is built. Having this information, we can map the semantic meanings to certain production rules (or nodes in a parse tree).

The Syntax based systems architecture was used in the NLITDB systems like LUNAR, NALIX etc.

2.3) Semantic Grammar systems

A Semantic grammar system is very similar to the syntax based system, meaning that the query result is obtained by mapping the parse tree of a sentence to a database query in database query language such as SQL. The basic idea of a semantic grammar system is to

(5)

some nodes together.

Based on this idea, the semantic grammar system can better reflect the semantic representation without having complex parse tree structures. Therefore, a production rule in a semantic grammar system does not necessarily correspond to the general syntactic concepts. Instead of smaller structures, the semantic grammar approach also provides a special way for assigning a name to a certain node in the tree, thus resulting in less ambiguity compared to the syntax based approach.

The Semantic grammar systems architecture was used in the NLITDB systems like PLANES, LADDER, REL etc.

2.4) Intermediate Representation Languages (IRL)

Due to the difficulties of directly translating a sentence into a general database query languages using a syntax based approach, the intermediate representation systems were proposed. The idea is to map a sentence into a logical query language first, and then further translate this logical query language into a general database query language, such as SQL. In the process there can be more than one intermediate meaning representation language .

The following Figure shows a possible architecture of an intermediate representation language system.

(6)

The Intermediate Representation Languages (IRL) architecture was used in the NLITDB systems like CHAT-80, PHILIQA, TEAM etc.

3. Classification of NLITDB Systems

Each NLITDB system used an architecture to process the natural language question submitted by the user. We have collected information about Twenty One existing NLITDB systems through research papers published and available in the Internet.

After studying and analyzing these NLITDB systems, We have classified them into different categories based on the architectures adopted by them.

The following NLITDB system adopted Pattern-Matching systems architecture.

SAVVY

The following NLITDB systems adopted Syntax-based systems architecture.

LUNAR NALIX

The following NLITDB systems adopted Semantic-Grammar systems architecture.

LADDER RENDEZVOUS PLANES

REL EUFID ELF

EASYASK

ENGLISH QUERY

The following NLITDB systems adopted Intermediate-Representation Languages (IRL) architecture.

PHILIQA CHAT-80 TEAM IRUS Ginsparg’s JANUS LOQUI

MASQUE/SQL EDITE

CLE

(7)

Classification of NLITDB Systems

0 2 4 6 8 10 12

Pattern-Matching systems

Syntax-based systems

Sematic- Grammar systems

Intermediate- Representation

Languages Architectures

Number of NLITDB Systems

The following graph shows the above classification

4. Results

Based on our study and analysis about NLITDB systems, We came to know about twenty one existing NLITDB systems and their architectures. We also came to know that most of the NLITDB systems have adopted Semantic-Grammar systems and Intermediate-Representation Languages Architectures.

5. Conclusion

Natural Language Interface to Database (NLITDB) system allows database users to formulate questions in natural language like English to retrieve information from a database. Users questions are translated into database query language such as SQL, which is processed by a DBMS to return the answer. Many NLITDB systems were developed since 1960’s with different architectures. In this paper, We have classified twenty one existing NLITDB systems based on four main architectures adopted by them. Based on our study and analysis about NLITDB systems, We conclude that most of the NLITDB systems have adopted Semantic-Grammar systems and Intermediate-Representation Languages Architectures.

References

[1] I. Androutsopoulos, G.D. Ritchie, and P. Thanisch, Natural Language Interfaces to Databases An Introduction, Journal of Natural Language Engineering 1 Part 1 (1995), 29--81.

[2] Eric Brill, Transformation Based Error Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, ACL (1995).

(8)

[3] Eugene Charniak, A maximum-entropy-inspired parser, North American Association for Computational Linguistics (2000), 132--139.

[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cli#ord Stein, Introduction to Algorithms, Second Edition, MIT Press and McGrawHill, 2001.

[5] D.R. Dowty, R.E. Wall, and S. Peters, Introduction to montague semantics, D.Reidel Publishing Company, Dordrecht, Holland, 1981.

[6] G. Hendrix, E. Sacerdoti, D. Sagalowicz, and J. Slocum, Developing a Natural Language Interface to Complex Data, ACM Transactions on Database Systems (1978), 105--147.

[7] Daniel Jurafsky and James H. Martin, Speech and Natural Language Processing, PrenticeHall Inc., Upper Saddle River, New Jersey, 2000.

[8] Rohit J. Kate and Raymond J. Mooney, Using StringKernels for Learning Semantic Parsers, COLING ACL (2006).

[9] Yunyao Li, Huahai Yang, and H.V. Jagadish, Nalix:an Interactive Natural Language Interface for Query ing XML, SIGMOD (2005).

[10] Yunyao Li, Huahai Yang, and H.V. Jagadish, Constructing a Generic Natural Language Interface for an XML Database, EDBT (2006).

[11] Raymond J. Mooney, Learning Language from Perceptual Context:A Challenge Problem for AI, Amer ican Association for Artificial Intelligence (2006).

[12] AnaMaria Popescu, Alex Armanasu, Oren Etzioni, David Ko, and Alexander Yates, Modern Natural Language Interfaces to Databases:Composing Statistical Parsing with Semantic Tractability, COLING (2004).

[13] Woods, W. (1973). An experimental parsing system for transition network grammars in Natural Language Processing, R. Rustin. Ed., Algorithmic Press, New York.

[14] B.J. Grosz, “TEAM: A Transportable Natural Language Interface System”, In Proceedings of the 1^st Conference on Applied Natural Language Processing, Santa Monica, California, (1983), pp 39-45.

[15] P. Resnik, “Access to Multiple Underlying Systems in JANUS”, BBN report 7142, Bolt Beranek and Newman inc., Cambridge, Massachusetts, (September, 1989).

* * *