• No results found

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

N/A
N/A
Protected

Academic year: 2021

Share "NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Gauri Rao et al./(IJCSE) International Journal on Computer Science andEngineering Vol. 02,No. 02,2010,219-223

...

-NATURAL LANGUAGE QUERY

PROCESSING USING SEMANTIC

GRAMMAR

'Gauri

Rao,

2

Chanchal

Agarwal,

3

Snehal Chaudhry,

4

Nikita Kulkarni, ,

5

Dr. S.H. Patil

ILecturer department 0f Computer Engineering BVUCOE, Pune 2Lecturer department 0 fComputer Engineering BVUCOE, Pune 3Lecturer department 0 fInformation Technology BVUCOE Pune

4Lecturer department 0 fComputer Engineering BVUCOE Pune

5Professor and Head department 0 fComputer Engineering Bharati Vidyapeeth University College of Engineering, Pune, India

AbstrKt: TIle fidel of •••• raI ~ prottUillf:

(NU) lias .at a draaatic sIIift _ bodt I"a'Cafd

dirutioII a.t ~ • tile past KYeraiyears. III

tile past, __ ~ _,COla•••••••••• ~ tmdaI

to pIU'dyS)WboIic ..edtods.. Ikcntly. _ore

a.t..ore on is ~ toward Itybrid _dIIods tItat

m.biIIe _ e.piriaI eorw-... ~

~ tile me of probabilistic a.t ••~ ••••

tIIeordic tuIuIiqIIes. wid! ~ S)WboIic ..edtods..

The main purpose of atural Language Query

Processing isfor an English sentence to be interpreted

by the computer and appropriate action taken. Asking

questions to databases in natural language is a very

convenient and easy method of data access, especially

for casual users who do not understand complicated database query languages such as SQL.

This paper proposes the architecture for translating English Query into SQLusing Semantic Grammar

1.0 INTROD

cno

aturaI language processing is becoming one of the most active areas in HIIIII3D-COIIIpU Jnteractioo.. It is a branch of AI which includeslnfunnatioo Retrieval, Machine Translation and Language Analysis. The goal of NLP is to enable communication between people and computers without resorting to JDeIIIOrization of complex commands and proa:dures. In other words. NLP is a technique which can make the computer undeI'sIand the languages naturally used by humans. While natura1 language may be the easiest symbol system for people to learn and use, it has proved to be the banIest for a computer to master.

Despite the cbaIleoges, natural language processing is widely reganled as a promising and critically important endeavor in the field of computer research.

The general goal for most computational linguists is to instiU the computer with the abili1;yto undersaand and generate natura1 language so that eveotually people can address their computers througb text as though they were addressing another person. The applications that wiD be possIDIe when NLP capabilities are fully realized are impressive computers would be able to process natural language, translating languages accurately and in real time, or extracting and summarizing information from a variety of data sources, depending on the users' requests.

2.0 RELATED WORK

The very first attempts at NLP database interfaces are just as old as any other NLP research. In fact database NLP may be one of the most important successes inNLP since it began. Asking questions to databases in natural language is a very convenient and easy method of data access. especially fur casuaJ users who do not undeI'SIand complicated database qlJel}' languages such as SQL. The success in this area is partly because of the real-world benefits that can come fiom database P systems, and partly because NLP works very well in a singIe-datahase domain. Databases usually provide smaU enough domains that ambiguity problems in natura1 language can be resolved successfully. Here are some examples of database NLP systems:

LUNAR (Woods. 1973) involved a system that answered questions about rock samples brought back from the DlOOIL Two databases were used, the chemical analyses and the literature references. The

(2)

Gauri Rao et al./(UCSE)International JournalonComputerScienceandEngineering Vol.02,No.02,2010,219-223 program used an Augmented Transition Network

(ATN)parser and Woods'Procedural Semantics.The system was informally demonstrated at the Second

AnnualLunar Science Conference in 1971. [1]

LIFERILADDER wasone ofthe first good database

NLP systems. It was designed asa natural language interfacetoa database of information aboutUS Navy ships. This system, as described in a paper by Hendrix (1978), used a semantic grammar to parse questions and query a distributed database. The LIFERILADDER system could only support simple one-table queries or multiple table queries with easy join conditions. [4]

3.0SYSTEM DESCRIPTIO

A brief description of the system is as follows: Suppose we consider"a database say ORACLE

-Within this oracle database 1 have placed certain

tables, which are properly normalized. 0 if the

userwishes toaccess the data from the table be/sbe

hasto be technically proficient inthe SQL language

to make a query for the ORACLE database. Our

system eliminates this partand enables the end user to accessthe tables in his/her language.

Letustake anexarnple..

Suppose if we want to view information of a

particular employee from EMP table then we are

supposedtousethe foUowingquety:

SELECf

*

FRO EMP WHEREe_name

=

'

ABC';

But a person, who doesn't IrnowORACLE, wiD not

be able to access the database unless belsbe Irnows the syntax and semantics of firing a query to the

database. But using P,this task:of accessing the

database will be much simpler. So the abo equery willberewritten usingNLP as:

Give the infunnation of employee whose name is

ABC.

80th the SQL statement andNLPstatement toaccess

the EMP table would result in the same output the

only diffi:reoce being, a normal person who doesn't

Irnow anything about SQL can easily access the

ORACLE database.

4.0 SCOPE OF THE SYSTEM

The scopeoftheproposed system isas follows:

»

To work with anyRDBIMS one sbouId Irnow

the syntax of the commands of that particular

database software (Microsoft SQL. Oracle,

etc.),

»

Here the aturaJlanguage processing is done

onEnglish i.e, the input statements have to be

inEoglish.

Input from the user is taken in the form of

questions (wb- form like what, who, where,

etc).

ISSN : 0975-3397

>

A limited Data Dictionary is used where all possible words related to a particular system

will be included. The Data Dictionary of the

system must be regularly updated with words

that arespecifictotheparticularsystem

>

Ambiguity amongthe words will be takencare of whileprocessingthenatural language.

>

All the names in the input natural language

statement have tobein double quotes.

>

Data Dictionary used will be: -EMP,DEPT

and PROJECT

Thetablesusedareshownbelow>

Employee departmel project

id id id name name name salary location empid edptid

cap

a

c

i

t

y

Fig ITables used in the System 4.1 System Requirements 4.1.1 Hardware Requirements PI (Pentium 1)01"bigherprocessor RAM- minimum 64MB RAM HDD - minimum 500MB of free H3nJ..disk space

VGADisplay Adapter- withbigherresolution

4.1.2 Software Reqllireme.ts

Operating System- WlNDOWS 9S01"abo e Datahase-SQL SERVER2000orACCESS Java,MS isualC++compiler5.016.0

5.0

SYSTEM ARCBJTECnJRE Generally P has following steps:

->

Morphological Analysis: Individual words are

analyzed into their components and non word

tokens such aspunctuation are separated from

thewords..

~ Syntactic Analysis: Linear sequences of words

are transformed into structures that sbow bow

the wordsrelatetoeach other.

~ Semantic Analysis: The structures created by

the syntactic analyzer-areassigned meanings.

>

Discourse integration: The meaning of an individual sentence may depend on the seoteoces that precede it andmayinfluence the meanings ofthesentences that fullow it..

Pragmatic Analysis:The structure representing

what was said is reinterpreted to determine what was acIualIymeant

(3)

Gauri Rao et al./(IJCSE) International Journal onComputer Science and Engineering

Vol. 02,No. 02,2010,219-223

Fig.2 Stages inNatural Language lnterpretation Process

Thesystem includes the following modules:

G : Designing the front end 01" the user

interf3ce where the user .J)enter the query in atura.ILanguage.

Parsing: Derives the Semantics of the atura.I Query given by the user-and parses it in its k:dmical form,

QueryGeneration: After the successfuJ parsing

of the statement given by the user, the system

generates a query against the user statement in

SQL and further gives it to the back.end

database.

Data Collection: This module collects the

output of the SQL statement and places it in the set"Interface Screen as a result form,

6.0 SE

tc

GRAMMAR

Thereare two main partsof a semantic grammar. The

first is alexicon that stores all the possible words that

the grammar is aware of. A simple entry in the

lexicon might look likethis:

-(customer ->customer patron member) (customers ->customers patrons members)

On the left-hand side of the '->' the word 'customer'

defines a symbol that can be used in the grammar.

When 'customer' isused in the grammar, it refers to

the English words on the right-hand side of the '->', that is 'customer', 'patron' or 'member'. Similarly,

the plurals of these words are shown in the next line.

OnlysingleEnglish words.or terminal symbols,can appear in the lexicon.

The other part of the semantic grammar in 01 es

I"IIIes to combine the tenninaJ symbols in the lexicon to form phrases01"sentences in a specific way.

FOI"example,the rule

(AIT_TAPE_CUSTOMER -> RENTED by

customer IT_ER)

Demonstrates bo the TAPE. STOMER attribute

can be referred to in English.On theleft-hand side is the non-terminal representation of the phrase.

This can be used in otber rules to refer to this phrase.

On the right hand side is a combination of non

-tenninal symbols&om other rules (in PPER CASE)

and from the lexicon (in lower case). RENfED represents another rule foe synonyms of the verb

•rented' ,such as taken out,bonowed etc. ote that TED is not in the lexicon because taken out' is a two-word phrase,and so must be defined as a rule. The words by and customer appear in thelexicon,

and the AIT_ ER symbol refers to a number

attnDute. 0 the abo e rulecan be used to parse the

phrase"borrowed by member number 12",01""taken

out by customer 14"etc.

7.00VERVIE OF PROCESS

Firstly, the English input (in the fcrm of a list) is

parsed by the semantic grammar. then a post

-processor matches table and attribute names and joins up tables if the query in 01 es more than one table.

After that the post-processor can construct the

resulting SQL queryand output it.

english query

semantic

representation

SQLqueoy

Fig.3Database Semantics

The question "What is employee name with salary greater than 2000" is processed by the system as given below.

The English text is converted into a list, so that the

system can process it.The sentence becomes:

(WHAT IS EMPLOYEE AME WIlli SALARY

GREATER mAN 2000)

Now this list isparsed bythe semantic grammar. The

(4)

Gauri Rao et al. /(IJCSE) International Journal onComputer Science and Engineering

Vol. 02,No. 02,2010,219-223

sentence, using the semantic grammar. This sentence

only has one possible parse, so only one semantic

representation isreturned:

«SELECT NAME! FROM (EMP (WHERE

(SALARY >= 2000»»)

The semantics already look similar to an SQL query.

The brackets in the semantics indicate how the

phrases from the sentence relate to each other. For

ex~ple, the (EMP (WHERE (SALARY>= 2000»)

section shows that the 'born in or after 1920'

condition relates to the director. The 'NAME!'

symbol, after SELECT, isa special representation for

both the first and lastnames.

This semantic representation of the sentence is now

taken by a post-processor to transform it into an SQL

query. The'NAME!'symbol is expanded to FNAME

and L AMEand these are identified as attributes in

the EMP table.The DEPT attribute isalso matched to

theEMP table because the WHERE clause directly

refers to the EMP table. After- processing the

semantics, there is only one table needed for this

query, so only the EMP table is included in the

FROM clause. and there is no need to add join

conditions..

The aim of a natural language interfuce is to f3cilitate

the user to computer in a natural way. For this

purpose. we have designed a domain specific

dictionaryto keep the sytlOII}'Imof the columns and

tables ames. The addition of synonyms makes it

possible fur the user to write a sentence in different

natural ways. We call this dictionary as semantic

dictionary, because there is nosyntactic information

fortokeos..

To build an appropriate interface for inputting

database semantics information we need to develop a

framework for this information.This will allow us to

identifYthe different areas needed and easily obIain

the infonnation from the 1JSeI'The first thing we need.

to know about the database iswhat the tables and

attributes represent in the real world The second

thing is ho they relate to each other. Finally we

need to know the relationships from attrIbutes to the

tablesinwhich theyare contained Attnbutes that are

foreign keys should not be considered in this phase,

because they link tables to other tables, and the

appropriate phrases should have been covered by the

table-table relationships.

Translating from one language to another-always

requires some kind of paring. Parsing is the process

of identifying sIructwe in data..It determines whether

the input data bassome pre-detennined sIructwe and

respond accordingly.

Parsing requires a set of grammars to be defined

These grammars are set of rules, which define how

the language is sIructwed Rules are specific pattern

ISSN :0975-3397

of data, which appears in the input. These rules can

further reference to other rules, which are called sub

rules. Rules can also be recursive if they somehow

refer back to themselves. The rules consist of

products, which are the different ways in which the

rule can be satisfied. The actual data is referred to as

terminals. Parser may also contain actions. To

translate natural language queries, system needs to

have a parser, which contains all these grammar

rules. The parser also has to know the table names

column names, any relationships between tables and

related words for the table and column names. Only

the most common grarnmar isincluded in the parser.

8.0 ALGORITHM

»

Tokeoization(scanning)

• Split the Queryin tokens

• Give order-number to each tokenidentified

Split Queryand extractpatterns

• Look for sentence connectorsIcriteria

words

• Break Query on the basis of

connectorIaiteria tokens.

• Use criteria tokens to speciJY

condition in query.

• Find attributes and values after criteria

token

ap value for identified aItn"bute and

corresponding table

Replace synonyms with proper attribute

names

»

Get intermediate funn ofQuery

Transfunn itinto SQL

9.0 RESULTS

The graphical User Interlace isasshown below. The

IJSeI' bas to first login and then connect to the

database A database setting is required to access the

database after getting the information about host

name,database name,IJSeI'name and the password

Once the database is setup, English Query can

translate very complex English queries to SQL with

the capability of searching multiple tables and

multiple fidds.

Sfqls fullowed to get the res

uIt-»

Type the natural language queryin the

dialogue box given.

»

Click on"Proceed" button.

~ The systemwill ask the IJSeIfor the'

expected meaning..In caseof ambiguities

the IJSeI'bas to selectthedesired query.

Click on"Generate SQL"button.

~ The systemwill generateSQLquery.

Click on"Run Query".

(5)

Gauri Rao et al./(UCSE) International Journal onComputer Science andEngineering

Vol. 02,No. 02,2010,219-223

}>- The result of thequay will bedisplayed in

the Output box.

Fig.4.Login screen

Following example demonstrate SQL statement

translatedby EnglishQuer

y:-Output :- 12000

lo.OF1JTUllE ENHANCEMENT

ore new grammar can be added to the parser-to

increaseeffeeti eDeSS.

Adding a tbesaurus is another suggestion, which

could belp automating therelatedwords fortableand column names.

With the belp of a tbesaurus, the user- input can be

pre-processed tosubsIitute related words with table

01"column namesand alsoremove unwantedwords.

So fur. this system considers selection and a few simpleaggregations. Thenext step of theresearch is, to accommodatemore complexqueries.

11.0CO CL SIO

aturaI Language Processing can bring powerful eobancements to virtualJy any computer"program interface.. This system is cum:otIy capable of

handling simple queries with standard join

conditions.BecausenotallformsofSQL queriesare supported, further-development would be required before the system can be used within NIDBl

AJternatives fOl" integrating a database P

component into the OBI were considered and

assessed.

12.0 References

Huangj.Guiaog Zangi, I'billip Coy Shcu ~A NatmaI

Language daIabase lntaface based onpubabilislic context freegt3IIIID3I". IEEE lnIanaIionaI ~ on SeIrIamic Compding andsystans 2008

Akama, S.(Ed) Logic.1onguage and compukIlion. Kulm:w

-Academic:publisbcrs. pp.7-11.1997.

ELF Software CO. DlIIroI-Longuage DoIDbase inll!ifaca

froM ELF SofIlrIlR Co.cik:d oycmber 1999. available

Iiom Inkmc:t:

Hendrix. G.G, Saa:nIoti. E.D.•~wic:z. D.•Slocum. J.

"'Developing anabnI Janguag,c ioIJ:rlaa: to complex data=.

inAeM Transoctiotu on dalDbase sy.tII!Im. 3(2). pp. 105

-147.1978.

[5) .Jo5qIh. S.W, AIeIiwas,. R.-A knowkdge-bascd subsystJ:m

focaoatwaI ~ inferface toadatabase that predicts andexplains qUCl)' faiJIRS". in IEEE Ctt; pp.80-87. 1991.

[6] Mitrovic; A.AIarow~ Il!Ot:IriRg $)lSlIeJIf forSQL.

Univemty of CanteJbuJy. 1993..Moore. J.D. ~

~ forim1r~ appI~ maine ~

tuIOrSmore like humans"'. inp,oceedJirgs AJ-ED. pp.36-42, 1995.

[7] Sub"K..S,Ptrtins,. w.e,> '"The effi:cts of a system echo in a n:stricH:d oaIurlII language database inlerl3ce fiI novice users'".in IEEE

S

.

,-

sc:i#!nt%s, 4.pp.594-599> 1994.

[8] Wbt:obua, W, Dilts, D.N..'"lnIegTaIing diverse CIM daIa bases: tbe role of nabmII language ink:rlitce~. in IEEE

TnmsacIions 011 .1)1SII!JIU,IIItDf. and q-bemetia, 22(6). pp.

1331-1347.1992.

[9] Dan Klein,Christopher D.Manning: Corpus-Based Induction

of Syntactic Structure: Models of Dependency and Constituency. ACL 2004: 478-485. ____ w ~._ [I]

hltll

I

L

na

J

I

;

a

~

J

~

On

e

1

~~

t

l

:

!f

.

l

t

o(

[2]

r

~

[3]

-

[4J .,.

'

''-.

.

.

~....":.. ~~~:"!:.~:••...

-Fig.5 Snapshot ofquery: What isthe salary of "Gauri"

English Query: -What isthe salary of"GauriRao"

Meaning of Query:- salary of employee with name gauri rao

SQL Query:- Select salary From emp where empnarne ="gaurirao"

References

Related documents

Conclusions: This review found that innovative service delivery approaches, such as those potentially offered by CHWs, for adolescents in sub-Saharan Africa are lacking, CHW

The DADR model posits that specific social, cognitive, and behavioral factors related to the homework task (e.g., per- ceived difficulty and relevance of the task), provider

In view of the high prevalence of sleep apnea among commercial vehicle drivers as reported by MIROS’s studies, Obstructive Sleep Apnea was included in the standard [14]..

Our study is the first large, prospective study to identify the incidence of PRIS in critically ill adults administered propofol for longer than 24 hours. Although other reports

Repeat hepatectomy with systemic chemotherapy might improve survival of recurrent liver metastasis from colorectal cancer?a retrospective observational study RESEARCH Open Access

Companies which had more safeguards also applied their ethical codes and policies more effectively than companies with lesser ethical safeguards; moreover the non public

ECTOPIC (HETEROTOPIC) PANCREAS IN THE MESENTERY OF THE JEJUNUM: CASE REPORT AND LITERATURE

Additive x dominance type of gene effect (j) was positive significant for days to 75% heading in cross III; for days to maturity in cross II and III; for plant height