CHAPTER 1 A Survey on Question Answering System
9. Query Refinement
In our QA system, after the users receive and go through a list of ranked candidate answers, they will be asked for their opinions about the quality of those answers by the question “Did those documents answer your question”. If the answer is yes, that means
110
the users have been satisfied with the search results and they have found the useful answers in an acceptable time. If the answer is no, it means either our QA system have not successfully retrieved the expected answer for the users or the users did not find it among as many candidate answers as they can go through. The users then will be offered an opportunity to refine the query by giving more explicit and useful search clues to our system to change the components and the ranks of the next search result. The next retrieved candidate answers are expected to have more correct answers with the higher ranks.
There are three steps in the query refinement for the users to follow. The first one is asking the users to reedit their text questions. The users can add and eliminate some necessary and useless terms respectively in their questions. Different from those popular web page search engines’ advance search pages, the users in our QA system have the previews search results on the same page with the refinement as their references of making changes to their queries. We believe this special improvement helps our users better understand the influence of each term they typed in their questions last time and make better questions this time. For example, if a user who is seeking for the information that the population of London in Ontario asked the question “what is the population of London” at the first time, the most of the candidate answers this user got for the first time is the population of the city London in United Kingdom. By using this search result as the reference of the query refinement, this user will learn to specify the city into the province of Ontario. And he or she will probably get the expected information next time. However, this step is not mandatory to the users to follow. They can skip this one and make the changes in the next two steps. Therefore, if the query has not been revised at all, our QA
111
system will not send it to the Google search engine again as a new search query. Because the new Google search result would still be the same if our system did. And the new prepared document collection would be the same with the last collection. Therefore, the previews document collection and the index will be kept in our system to avoid downloading the same related web pages again and save the answer retrieval time eventually. If the users did make some changes to their queries, then the new queries will be sent to the Google search engine to acquire the new related web pages as the new knowledge source. Then both the previews document collection and the index will be replaced by the new ones. At last, the new candidate answers will be searched from the new document collection based on the new question. Therefore, the query reediting completely affects the entire search result including the members of the candidate answers and their ranks.
The second step in the query refinement is letting the users evaluate each term in their questions and assign a numeric value from 1 to 10 to it. The users can attach the different values to the different terms in the questions based on their opinions of how important those terms are in their queries respectively. In other words, the new term values of the questions’ terms will be treated as the new term weights in the next candidate answer ranking process. As it was mentioned before, our QA system initializes the weight of each term in the question with the value of 1 in order to treat them equally in the first candidate answer ranking with the MCCSM. In this query refinement, the users can emphasized some important terms by attaching some bigger term weights to those terms. Our system will use these new weights to list the documents which are more related to those emphasized terms with the higher ranks. For example, the question “who is the first
112
President of America to visit Europe” retrieves also some documents which introduce the Presidents in Europe. If the user who asked this question attached a bigger weight to the term “America” than “Europe”, the documents which contain the information about the President of America will be ranked higher than the documents which introduce the Presidents in Europe. The user then can find some useful retrieved documents with less time spent. Thus, the question term weighting does not change the members of the candidate answers but the ranks of them.
The last step of the query refinement is binding a Boolean query with the term-based query. As the disadvantage of the traditional term-based document search strategy our QA system uses, the strategy does not offer the function as the Boolean operator “NOT” does. It only searches the documents which the users are willing to read. It does not filter the documents which the users do not expect to see specifically. That is why we can utilize the function of the Boolean operator “NOT” to help the users indicate one or more terms which are not expected to be seen in the retrieval result. In other words, as long as a document which contains one of the terms mentioned in the Boolean query, this document should not be retrieved even if it also shares some common terms with the question. The users in this step can build up their own Boolean queries with the unwelcomed terms. For example, a user asked the question “how to cook some sweet corns” in our QA system. Our system then sends this question to the Google search engine to search for some relevant web pages. During the Google search process, some web pages which contain the recipe of cooking the salty corns are considered as the related web pages to the query and will be shown in the Google search result. Our QA system then will download those web pages and save their text contents into our
113
document collection. Thus, since some documents which came from the salty corn web pages share the common terms “cook” and “corns” with the question, they will definitely be retrieved by the system and presented to the user. However, they are clearly not the recipes the user is looking for and their existences in the candidate answers may disturb the user’s answer searching. Therefore, the user can use the Boolean query “NOT” to restrict those documents showing in the candidate answers. For doing it, the user can simply add the term “salty” in the Boolean “NOT” query. In the next answer retrieve process, as long as a document which contains the term “salty”, it will not be retrieved by our QA system or shown back to the user. Thus, same with the question reediting but different from the question term weighting, the Boolean query changes the members of the candidate answers.
Again, same with the first step in this query refinement, the second and third step are also not mandatory to the users to finish. Those three steps are all independent and affect the future search result separately. This query refinement will be executed repeatedly until the users find their expected answers.