Motivation
yAbility to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces
yChallenges
yAutomatically understanding Natural Language yTranslate this parsed natural language query into a Database query
NaLIX
yDeals with the challenge of translating NLQ into Xquery
yDealt with
yAttribute name confusion yQuery Structure Confusion yDifferentiate between “Return the book with the lowest price” and “Return the lowest price of the book”
Schema-Free Xquery
yA function called “meaningful query focus” used to retrieve the relation between two keywords in the search
yExample:
“Return the director of Gone with the wind” “Gone with the wind” movie
Query T ranslation
yRelations between the words to be translated into Xquery
yNLQ converted to a parse tree
yThree main steps
yClassification of terms in the parse tree of NLQ yValidation of parse tree yTranslating parse tree into Xquery
Query T ranslation
yGiven a valid parse tree, identify the relations between the name tokens and translate into xquerysyntax
yNot so straightforward
Definitions
yEquivalent NTs: NTs with same noun phrase with same modifiers .
yMovie (nodes 4 and 8) in example ySub-parse tree: A subtreerooted at an operator token and has atleasttwo children
yCore Token: NT in a sub-parse tree with no descendant NTs (or) NTs equivalent to another core token
yMovie (nodes 4,8) and book (11)
Definitions
yDirectly Related NTs:Parent-child relation
yTitle and movie yRelated by Core Tokens: Related to same or equivalent core token
yRelated NTs: Either of the above or related to the same NT
ySets {2,4,6,8} and {9,11} in example yThe set of related NTs are grouped together in the same MQF
V ariables
yEach set of equivalent name tokens assigned a variable
y<var>→ NT yA variable can also be made up of a group of variables. Called ‘composed variables’
Te mplate Matching
yMatching a variable or a group of variables to a given template
yTemplate gives the translation for that particular set of variables/phrases in the sentence
Aggregator Nesting
yIf the NT attached to an aggregate function is a core token, consider the entire sentence as part of the aggregation
y“Return the number of movies, where the director of the movie is Ron Howard” y“Return the lowest price for each book” yIf the NT attached to an aggregate function is not a core token, the scope of the aggregation is limited to all the directly related NTs of the attached NT
y“Return each book with lowest price”
Interactive Query Formulation
yUsers asked to rephrase the question if there is no valid parse tree
ySuggestions given to rephrase the query
yGiven the attribute value tokens, the phrases that epitomisethe relation between the attributes can be rephrased.
yAmbiguity in the attribute values resolved using wordnet
Experimental Evaluation
yParticipants asked to search for a given question using keyword search or NaLIX
yComparison over
yEase of use ySearch quality yParticipant asked to reformulate query iteratively until an acceptable threshold of precision and recall is reached.
Experimental Evaluation
yEase of use:Time taken to come up with an acceptable NLQ
ySearch Quality:Precision and Recall of the resultant Xquery
yUsed
books data from DBLP database for evaluation
Results – Ease of Use
yAverage time of 90 seconds to form a query
yLess than 2 iterations per query on average
yAtleastone participant got the correct NLQ in the first iteration for each question
Results – Search quality
yAverage Precision of 83% and Recall 90.1%
yQuality affected by
yQuality of NLQ given by user yParser accuracy yAverage precision of 95.1% and Recall 97.6% for queries that are formulated and parsed correctly
Discussion
yPositive points
yDrawbacks
yIs it useful for your project ?
yAre you convinced of its usability over different datasets
yAny suggestions/ideas on how to make this better