• No results found

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

N/A
N/A
Protected

Academic year: 2021

Share "Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

RohitParavastu

Constructing a Generic Natural Language Interface for an XML Database

(2)

Motivation

yAbility to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces yChallenges yAutomatically understanding Natural Language yTranslate this parsed natural language query into a Database query

(3)

NaLIX

yDeals with the challenge of translating NLQ into Xquery yDealt with yAttribute name confusion yQuery Structure Confusion yDifferentiate between “Return the book with the lowest price” and “Return the lowest price of the book”

(4)

Background

yKeyword Searches yKeywords that are expressed together in a query must match objects that are “close together” in the database yProblem ? yToo blunt yAbstract notion of “close together”

(5)

Schema-Free Xquery

yA function called “meaningful query focus” used to retrieve the relation between two keywords in the search yExample: “Return the director of Gone with the wind” “Gone with the wind” movie

(6)

Query T ranslation

yRelations between the words to be translated into Xquery yNLQ converted to a parse tree yThree main steps yClassification of terms in the parse tree of NLQ yValidation of parse tree yTranslating parse tree into Xquery

(7)

To ken Classification

yTokens yWords/phrases that match a Xqueryconstruct or an attribute value yMarkers yWords that don’t occur in database and not a Xqueryconstruct

(8)

To kens and Markers

(9)

Query T ranslation

yGiven a valid parse tree, identify the relations between the name tokens and translate into xquerysyntax yNot so straightforward

(10)

Example

(11)

Definitions

yEquivalent NTs: NTs with same noun phrase with same modifiers . yMovie (nodes 4 and 8) in example ySub-parse tree: A subtreerooted at an operator token and has atleasttwo children yCore Token: NT in a sub-parse tree with no descendant NTs (or) NTs equivalent to another core token yMovie (nodes 4,8) and book (11)

(12)

Definitions

yDirectly Related NTs:Parent-child relation yTitle and movie yRelated by Core Tokens: Related to same or equivalent core token yRelated NTs: Either of the above or related to the same NT ySets {2,4,6,8} and {9,11} in example yThe set of related NTs are grouped together in the same MQF

(13)

V ariables

yEach set of equivalent name tokens assigned a variable y<var> NT yA variable can also be made up of a group of variables. Called ‘composed variables’

(14)

Te mplate Matching

yMatching a variable or a group of variables to a given template yTemplate gives the translation for that particular set of variables/phrases in the sentence

(15)

Te m pl at es

(16)

Aggregator Nesting

yIf the NT attached to an aggregate function is a core token, consider the entire sentence as part of the aggregation y“Return the number of movies, where the director of the movie is Ron Howard” y“Return the lowest price for each book” yIf the NT attached to an aggregate function is not a core token, the scope of the aggregation is limited to all the directly related NTs of the attached NT y“Return each book with lowest price”

(17)

T ranslation Process

ParseVariable binding Nesting Scope Final Xquery

(18)

Example Output

yQuery: Return each director, where the number of movies directed by the director is the same as the number of movies directed by Ron Howard

(19)

Interactive Query Formulation

yUsers asked to rephrase the question if there is no valid parse tree ySuggestions given to rephrase the query yGiven the attribute value tokens, the phrases that epitomisethe relation between the attributes can be rephrased. yAmbiguity in the attribute values resolved using wordnet

(20)

Experimental Evaluation

yParticipants asked to search for a given question using keyword search or NaLIX yComparison over yEase of use ySearch quality yParticipant asked to reformulate query iteratively until an acceptable threshold of precision and recall is reached.

(21)

Experimental Evaluation

yEase of use:Time taken to come up with an acceptable NLQ ySearch Quality:Precision and Recall of the resultant Xquery yUsed books data from DBLP database for evaluation

(22)

Results – Ease of Use

yAverage time of 90 seconds to form a query yLess than 2 iterations per query on average yAtleastone participant got the correct NLQ in the first iteration for each question

(23)

Results – Search quality

yAverage Precision of 83% and Recall 90.1% yQuality affected by yQuality of NLQ given by user yParser accuracy yAverage precision of 95.1% and Recall 97.6% for queries that are formulated and parsed correctly

(24)

Results

Precision of Search resultsRecall of Search results

(25)

Discussion

yPositive points yDrawbacks yIs it useful for your project ? yAre you convinced of its usability over different datasets yAny suggestions/ideas on how to make this better

References

Related documents

In this PhD thesis new organic NIR materials (both π-conjugated polymers and small molecules) based on α,β-unsubstituted meso-positioning thienyl BODIPY have been

For the poorest farmers in eastern India, then, the benefits of groundwater irrigation have come through three routes: in large part, through purchased pump irrigation and, in a

• Follow up with your employer each reporting period to ensure your hours are reported on a regular basis?. • Discuss your progress with

• The development of a model named the image based feature space (IBFS) model for linking image regions or segments with text labels, as well as for automatic image

Marie Laure Suites (Self Catering) Self Catering 14 Mr. Richard Naya Mahe Belombre 2516591 [email protected] 61 Metcalfe Villas Self Catering 6 Ms Loulou Metcalfe

The threshold into the stadium is through a series of layers which delaminate from the geometry of the field to the geometry of the city and creates zones of separation,

The Rater will discuss with the Ratee the proposed objective, target completion date, the evaluation or measurement criteria by which successful results can be identified, costs,

A third set of processes constituting the ePortfolio is that of the current ‘obligation to documentation’ (Rose, 1999, p. Many areas of life, from work and education to social