Application Scenario 1: General Vs Specific user information need

1.5 Application Scenarios

1.5.1 Application Scenario 1: General Vs Specific user information need

Sam has a homework to do about “dinosaurs’ communications”, he does not know a lot about dinosaurs, and nothing about their communication, so he decided to do some research on the web. He has the possibility to search about the “dinosaurs”, then about “dinosaurs communications”, or search directly about “dinosaurs communications”.

– In the first scenario, he starts the search session by typing “dinosaurs” in a search engine like Google, he hopes to find a summary about dinosaurs to have a general idea about them. However, as this query is too generic, the search engine gives back many documents and information causing an information overload in the SERP (see Figure

1.2). As a consequence, Sam must read all the snippets in the SERP and even read too many pages in order to build his summary.

We have to notice here, that this type of summary is typically the abstract (short text that describes the entity) of an entity (like dinosaur12) in the web of data.

The next step is to reformulate the query by adding the keyword “communications” (see Figure1.3), then to decide which results are the best from his point-of-view (POV). This decision depends only on the small text associated with each result (the textual snippet),

Figure 1.2: Google results for “dinosaurs” query

hoping that these results contain what he was searching for, i.e. an explanation about dinosaurs’ communications.

– In the second scenario, Sam starts the search session directly with the query “dinosaurs communications” (see Figure 1.3). The problem, in this case, is that he will suffer from what we call the information limitation, i.e. the query is specific enough to limit the retrieved information only on the query’s topic. So, he will get results about how the dinosaurs communicate, but no general information about the dinosaurs, so for each result he must find some general information about dinosaurs (that may not be included, such as the first result13) then about how they communicate.

1.5.1.1 A solution

As a solution, our proposed system ENsEN (Enhanced Search Engine) is able to find, the most important concepts in a result (regarding the query), associating each concept with a textual and factual summary built by combining the result’s content with information from other trusted, external sources.

Figure 1.3: Google results for “dinosaurs communications” query

Returning to our scenario, Sam types “dinosaurs communications” in this search engine, he will get results talking about dinosaurs communications, see Figure1.4. However, in the results, “dinosaurs” and “communications” are considered as important concepts. Therefore, ENsEN includes them in the generated snippets with summaries that Sam needed to understand the subject and satisfy his information need. In addition, ENsEN provides a list of the most important concepts related to the last two concepts, with a textual context that explains how these two concepts are related and where (on the result page). In this SERP, Sam will get the definition of dinosaurs (a) and communication (b). He will get also a list of top related concepts to his query (c), such as “Bellows”, “Guttural”, and “Corythosaurus”, to enrich his knowledge of the subject. In addition, the system shows that the excerpt (d) “The chambered headcrests on some dinosaurs such as Corythosaurus and Parasaurolophus might have been used to am- plify grunts or bellows.” is one of the best to contextualize the query, however, the term “communication” is not present in this excerpt.

Finally, by knowing the most important concepts (regarding the query) in each result (e), it will be easier for Sam to identify the most relevant result, economizing a lot of time and efforts.

Figure 1.4: ENsEN results for “dinosaurs communications” query

1.5.2 Application Scenario 2: Answering directly (in SERP) hard ques- tions

Sam’s professor asked him to search (on the web) an answer to the question “what school produced the most justices for Supreme Court justices?”.

Sam does not even know what is the “Supreme Court justices”. Therefore, he started his search session by typing “school produced the most justices for Supreme Court justices” in Google. Google returned the relevant results in SERP with some snippets (see Figure

Figure 1.5: Google results for “school produced the most justices for Supreme Court justices” query

Sam reads these snippets hoping to find the answer, but, unfortunately, no answer there, so he made a decision about the most relevant result (from his POV) and clicked on it. So, he had two expensive solutions: either he reads the whole page to find the answer, or he uses the browser search-in-the-page functionality with keywords like “Supreme Court justices”, “most”, “school”. It is possible that Sam does not find the answer on this page. In this case, he must go back to the SERP and visit another page, or he can re-formulate the query to get different results.

Figure 1.6: ENsEN results for “school produced the most justices for Supreme Court justices” query

1.5.2.1 A solution

Our system (ENsEN) will return the SERP presented in Figure 1.6, in which, Sam is able to find the most important concepts in each result with respect to the query, each concept is associated with a textual context also relevant to the user’s query. In this SERP, ENsEN shows that “Supreme Court of the United States” is the most important concept, giving Sam the definition of this term, and its most related concepts (Figure

Figure 1.7: Most related concepts of “Supreme Court of the United States”

ENsEN shows also that the next two most important concepts are “United States” and “Harvard Law School”, so Harvard may be the answer to Sam’s question.

Actually, in the context of the concept “Harvard Law School” (Figure 1.8), Sam will find the needed answer:

“The ones that produced the most justices are Harvard(15), Yale(6), and Columbia(2).” Not only the answer but also in which result this answer is, i.e. “The Most Popular Law Schools of Supreme Court Justices — TIME”, with the possibility to go directly to the corresponding paragraph in the result.

Figure 1.8: Entity description of “Harvard Law School”

In addition, ENsEN’s SERP will give Sam enough elements to explore more information about the subject, such as Harvard’s most related concepts (Figure 1.9), how they are related, in which context the query concepts were mentioned.

Figure 1.9: Most related concepts of “Harvard Law School”

To summarize, ENsEN will give Sam the answer for his question directly in the SERP. So, no need to visit or search again in the results, and it will give him the needed means to explore more about his information need.

In document Semantic Snippets via Query-Biased Ranking of Linked Data Entities (Page 36-44)