• No results found

3. The architecture of the NLPQC system

3.12 An example of run

In the following sections, we will show a complete example of how the system will create the corresponding SQL query for the sentence What is the address of author Leiserson? First, the system administrator runs the NLPQC preprocessor on the Cindi schema. Here is the actual output after using WordNet to build the semantic set for the table resource:

// Semantic set STRING table0[] = { "resource","resources","resourcefulness","imagination","book","boo ks","#", "title","statute","championship","deed","of","conveyance","claim", "entitle","#", "resource_id","#", "alt_title","#", "language_id","#", "keyword","#",

<skip> "" };

WordNet returns a full set of synonyms, hypernyms and hyponyms for each table and attribute name. The set is also called the semantic set of the name. The cardinality of a set can be as high as 50 elements. Because not all of the semantic set elements are relevant to the name, the system administrator has to edit the result. Some of the elements are removed from the set, and others are added as necessary. For example, WordNet does not return book as a synonym for resource, as we would like to have for the Cindi database. In this case, she adds book to the semantic set of the resource table. The administrator also adds the table rules and the SQL templates to the pre-processor output. The following example shows the edited semantic set for the table resource and its attributes, and the table and verb rules that are added by the administrator:

// Semantic set STRING table0[] = { "resource","resources","book","books","Resource","Resources","Book ","Books","#", "title","Title","#", "resource_id","#", "alt_title","#", "language_id","#", "keyword","#", <skip> "" }; // Schema rules STRING cross_reference[] = { "author,resource","resource_author","AND resource_author.resource_id=resource.resource_id AND resource_author.author_id=author.author_id", <skip> };

// Action verb rules

STRING action[] = // action, object1, object2 (related objects) {

"written", "resource", "author", "wrote", "author","resource" , "write", "author","resource" ,

"writes", "author","resource" , "writing", "author","resource" , };

The pre-processor output requires manual editing. For example, it took 3 hours for me to edit the information related to the 14 tables of the Cindi database used in this project.

The system administrator then builds the NLPQC run time module with the rules shown above. The end user can type in the query What is the address of the author Leiserson?

The NLPQC system uses the Link Parser to syntactically process the query. Here is the actual output edited for clarity. The bold text is NLPQC output and the italic text shows the links found by the Link Parser:

Enter query: What is the address of the author Leiserson? ++++Time 0.12 seconds (0.12 total)

Found 4 linkages (2 had no P.P. violations) NLP: print the link information:

LEFT-WALL - Xp - ? : Connects to the period at end of the sentence LEFT-WALL - Ws - what : Subject question

what - Ss*w - is : Question-words like 'who' that can act as noun-

phrases

is - Ost - address : Connects transitive verb to direct or indirect

object (s)

the - Ds - address : Connects determiner to noun

address - Mp - of : Prepositional phrase modifying noun of - Js - Leiserson : Connects preposition to its object

author - GN - Leiserson : Proper noun is introduced by a common noun the - DD - author : Connects definite determiners (the, his, Jane's) to

number, expression or adjective acting as nouns

? - RW - RIGHT-WALL : Connects the right-hand wall to the left-hand

wall

object=author, table=author, att=address, def_att=name, value=Leiserson SELECT author.address FROM author WHERE author.name='Leiserson'

The following graphical representation shows the same parse tree as above, in a more readable format:

+---Xp---+ | +----Ost---+ +---Js---+ | +---Ws--+Ss*w+ +--Ds--+--Mp--+ +--DD--+---GN---+ | | | | | | | | | | | LEFT-WALL what is.v the address.n of the author.n Leiserson ?

The LEFT and RIGHT walls are added by the Link Parser as helping delimiters. One link in the parse tree relates two tokens from the input sentence. In the example is - Ost – address, Ost connects the transitive verb is to its direct object address. NLPQC uses this information to identify the direct or indirect objects in the input, and to match them to table names in the database, with help from the semantic sets. The Link Parser returns zero, one or many parse trees. If there are more than one parse trees, NLPQC picks the first one, which has the lowest link cost. The link cost is a measure of the precision of the parse tree, and the Link Parser computes it. The link cost is the sum of UNUSED, DIS, AND and LEN parameters, as they are returned by Link Parser. If the minimum cost parse tree returned by the Link Parser is wrong, the NLPQC system will fail to produce a correct SQL query.

Next, the system tries to identify the tables involved. To do this, it checks on each noun in the input sentence if it belongs to one of the semantic sets. It finds that author is in the semantic set of the table author. Then it finds out the address is in the semantic set of the attribute address of the table author.

The next step is to apply the templates. The first template is <attribute>of<author>. The input matches the template. The default attribute of table author is considered for the rest of the sentence, i.e. Leiserson. The following SQL query is generated:

SELECT address FROM author

WHERE author.name=”Leiserson”

Related documents