• No results found

User-guided Disambiguation for Semantic-enriched Search

N/A
N/A
Protected

Academic year: 2021

Share "User-guided Disambiguation for Semantic-enriched Search"

Copied!
111
0
0

Loading.... (view fulltext now)

Full text

(1)

http://researchcommons.waikato.ac.nz/

Research Commons at the University of Waikato

Copyright Statement:

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis may be consulted by you, provided you comply with the provisions of the Act and the following conditions of use:

 Any use you make of these documents or images must be for research or private study purposes only, and you may not make them available to any other person.  Authors control the copyright of their thesis. You will recognise the author’s right

to be identified as the author of the thesis, and due acknowledgement will be made to the author where appropriate.

 You will obtain the author’s permission before publishing any material from the thesis.

(2)

i

This thesis is submitted in partial fulfillment of the requirements for the Degree of Master of Science (Research) at the University of Waikato.

July 2015

© 2015 Xinning Ren

User-guided Disambiguation

for Semantic-enriched Search

(3)

ii

Abstract

In this thesis, we develop, implement and evaluate a strategy for user-guided disambiguation for semantic-enriched search. We use the prototype system Capisco that combines some features of traditional text-based search engines and semantic search engines. Unlike many semantic search approaches that require formulating queries in quite complex semantic languages, the Capisco system is a keyword-based search engine.

In this work, we focus on the user-guided disambiguation which helps to capture the users’ information needs. The user-guided disambiguation developed in this thesis explores the semantic meanings for a user-provided keyword and displays these to the users for selection. We explored three different orderings in which the possible keyword meanings may be shown. Using our newly-developed disambiguation interface, the user selects one or several of these meanings from the list offered. The selected semantic meanings are then transferred to the search engine to identify matching documents.

This thesis provides an analysis of related work on search interfaces in semantic search, and in methods for semantic disambiguation. We developed the concept of user-guided disambiguation and implemented this concept in a prototype that explored three interface variations. We executed a user study on the three interface alternatives. Finally, we discuss the insights gained during this project, compare our approach to existing literature, and explore possible future work.

(4)

iii

Contents

Abstract ... ii

1. Introduction ... 1

1.1 Focus and Aim of this thesis: user-guided disambiguation ... 1

1.2 Methodology and outline of the thesis ... 2

2. Background ... 3

2.1 Traditional keyword-based search ... 3

2.2 Semantic web / Semantic Search ... 6

2.2.1 Semantic Web / Document structure ... 7

2.2.2 Semantic Search ... 10

2.3 Summary ... 11

3. Related work on semantic search interfaces ... 12

3.1 Broccoli: A User Interface for Semantic Full Text Search ... 12

3.2 A structured semantic query interface ... 14

3.3 Microsearch: An Interface for Semantic Search ... 18

3.4 User interaction with novel web search interfaces ... 20

3.5 SemSearch: A Search Engine for the SemanticWeb ... 23

3.6 A Goal-Oriented Web Browser ... 24

3.7 Summary ... 27

4. Semantic-Enhanced Search & user-driven Disambiguation ... 29

4.1 Concept of semantic-enhanced search ... 29

4.2 Capisco Prototype System ... 31

4.2.1. Concept-in-Context network (CiC) ... 31

4.2.2 Structure of Capisco system ... 33

4.2.3 System access and functionality ... 34

4.2.4 Search Processing ... 35

4.3 Requirements on user-guided disambiguation ... 36

(5)

iv

4.4.1 Concept based query expansion using WordNet ... 37

4.4.2 SemSearch ... 38

4.4.3 Librarian Agent Query Refinement ... 39

4.4.4 Summary of related work on disambiguation ... 41

4.5 Concept of user-driven Disambiguation ... 41

4.6 Summary ... 42

5 User-guided disambiguation: interfaces ... 43

5.1 Interface for user-guided disambiguation ... 43

5.2 Three disambiguation interfaces ... 45

5.3 A disambiguation example ... 46 5.4 Summary ... 48 6 Evaluation ... 50 6.1 Methodology ... 50 6.2 Study Setup ... 51 6.3 Results ... 52

6.3.1 Group 1 (very easy – about 1 to 7 options) ... 53

6.3.2 Group 2 (easy – about 8 to 21 options) ... 61

6.3.3 Group 3(medium – about 22 to 42 options) ... 69

6.3.4 Group 4(hard- more than 42 options) ... 78

6.3.5 Overall of four groups feedbacks ... 86

6.4 Conclusion ... 88

7 Discussion ... 91

7.1 Comparison between user interface and related works ... 91

7.1.1 Query/Keyword built ... 91

7.1.2 User refinement of the query/keyword ... 91

7.1.3 Search results ... 92

7.2 User-guided disambiguation in comparison to related approaches ... 92

7.2.1 Automatic disambiguation ... 93

7.2.2 User disambiguation ... 93

(6)

v

8 Summary and Future work ... 95

8.1 Summary ... 95

8.2 Future work ... 96

References ... 98

Appendix A: Material for User Study ... 101

A.1 Ethics Approval Letter ... 102

A.2 Participant Information Sheet ... 103

A.3 Research Consent Form... 104

(7)

1

1. Introduction

Web search engines play an important role in providing users with useful information selected from the wealth of information available on the internet. Traditional search engines use keyword-based search that finds the documents that contain some or all of the user’s input keywords (Bast, Bäurle & Buchhold, 2014). The keywords are matched to the documents literally, rather than understanding their semantic meanings. For example, when a user enters the keyword “Kiwi”, the documents that contain the term “Kiwi “are displayed to the user. But the search engine does not know if the “Kiwi” really refers to New Zealand people, Kiwi fruit or Kiwi birds.

Semantic search engines aim to improve the accuracy of searching by analyzing the contextual meaning of the query and understanding the users’ potential intent. However, many of the semantic search engines require their users to formulate the queries using a semantic query language such as SPARQL1, which is very complicated to understand and manipulate, especially

for users who are not technically proficient such as humanities scholars.

This thesis works with the semantic-enriched search engine Capisco, which is currently under development at the University of Waikato (Hinze, Taube-Schock, Bainbridge, Matamua & Downie, 2015). Capisco aims to provide a semantic-enhanced search that uses keyword-based queries. Using Capisco users do not need to have expertise in semantic query languages or syntax. Instead users input keywords, which are then assigned respective semantic concepts using the user-guided disambiguation developed in this thesis.

1.1 Focus and Aim of this thesis: user-guided disambiguation

Given the keyword-based queries used in Capisco, the intent and semantics of the user query is not as strongly defined as in semantic search engines. In semantic search, the input query terms need to follow a strongly defined semantic relating to RDF-S and OWL-based ontologies. In order to identify the semantics of search terms used for Capisco, this thesis will explore user-guided disambiguation. This requires the user to identify their intended meaning for a given keyword. In this thesis, three interface options are designed for Capisco.

This thesis introduces the concept of user-guided disambiguation, explains how this concept applies for the Capisco prototype and evaluates three disambiguation interfaces designed for the Capisco system. The three disambiguation interfaces are explored in a user study to identify which option best supports user-guided disambiguation.

(8)

2

1.2 Methodology and outline of the thesis

The purpose of this thesis is to examine users’ information seeking behavior in semantic-enriched search engine Capisco with interfaces for user-guided disambiguation.

We first introduce both keyword-based and semantic search concepts. We study related work from these areas to identify advantages and disadvantages of these search interfaces. The thesis then introduces our semantic-enhanced search engine Capisco and the concepts of user-driven disambiguation. We introduce three ways of presenting the user with information for the manual disambiguation. We investigate how the three different disambiguation interfaces impact on users’ search strategies. In a user study, 24 participants were invited to use the interfaces for disambiguation. Finally, we evaluate and compare the feedback for all three interfaces from the participants. The thesis concludes with a summary of the contributions and insights and a discussion of possible future work.

The outline of this thesis is as follows:

 Chapter 2 describes two search approaches: traditional keyword-based search and the semantic search, and introduces the semantic web briefly.

 Chapter 3 gives an overview of related work on interfaces for both keyword-based and semantic search. We provide a brief analysis and comparison of them.

 Chapter 4 introduces the Capisco system and explains its general architecture and concept.

 Chapter 5 introduces the concept of user-guided disambiguation, proposes three disambiguation interfaces and demonstrates how to use these interfaces. We also discusses the differences between our user interface and the related works.

 Chapter 6 describes the user study and evaluates the data obtained during the study and the participant feedback from the usage experiments and compare the three disambiguation interfaces.

 Chapter 7 discusses the wider impact of the outcome of the user study in the context of selected previous related work.

(9)

3

2. Background

Search engines, such as Google,2 Yahoo,3 and Bing4 are very popular and help users finding

useful documents in massive online information (Bast, Bäurle & Buchhold, 2012). However, the traditional keyword-based search has shortcomings in fulfilling users’ search requirements. Semantic web search and semantic-enriched search are addressing some of the shortcomings. In this section, we introduce the traditional keyword search (Section 2.1) and semantic search (Sections 2.2).

A scenario is used to help comparing the differences between the different approaches. In our scenario, a user wants to search for a person who is both painter and mathematician.

2.1 Traditional keyword-based search

In traditional search, such as full-text index search, the search engines mostly rely on the occurrence of keywords that appear in the text (Witten, Gori & Numerico, 2007). In order to do the full-text index search, the full-text index needs be to be built at first.

The first step lists all words existing in the text, along with the pointer to each appearance. For example, the text is “No pain, no gain”, the beginning index is as shown in Figure 1(a). The pointer to each word is the number here. As in Figure 1(b), the number 1 and 3 beside “No” indicates that the word “No” appears as the first and third word in the text. Stop words, such as “the” are typically excluded from an index. Each word has as many pointers as the times it appears in the text, in other words, the total number of the pointers is the number of the words in the text. In our example, there are 4 words in the text and therefore it has 4 pointers. Figure 2 shows a simplified process of how a keyword-based index is built step by step.

Figure 1(a): the text to be indexed

2 www.google.com 3 www.yahoo.com 4 www.bing.com

(10)

4 Figure 1(b): the index (simplified)

Figure 1: This figure displays a simple example of a keyword index.

Figure 2: The diagram displays the proceeding that how full-text index is built.

By the given index, the users are enabled to do the search. For the one-word search, the keyword is located in the ordered list. The keyword’s list of document and word numbers are extracted. For the query that contains several terms, all the terms are located firstly and then their related lists are extracted as mentioned in one-keyword search. Then, all the lists are scanned in order to get the more common document numbers (the documents contains all or some of the terms). Afterwards, the lists are merged and the result documents are displayed.

(11)

5

The diagram in Figure 3 shows the brief process of index search. .

Figure 3: The diagram displays the proceeding that how keyword-based search work.

The documents which contains more query terms or the query terms occur more frequently, or contains less non-query terms, are considered as more relevant documents. Therefore, it is easy to search documents with specific words by using a given index.

However, the limitations of this kind of search are apparent, as the search literally provides the search results that contain some or all of the keywords (i.e., syntactic matching not semantic). As a consequence, it is difficult to answer some user questions. For example, if our user from the scenario wants to find people who are both painter and mathematician, she may enter the keywords “Painter” and “Mathematician” for a search (Figure 4). The results of traditional search would then be the documents that contain the words “Painter” or “Mathematician” (Figure 5), rather than documents about specific persons who are a painter and a mathematician. Figure 6 shows one document from search results, which is about the relation between mathematics and art.

(12)

6 Figure 5: The search results of query “painter mathematician” using Google.

Figure 6: The document to which the first search result from query “painter” and

“mathematician” refers (shown in Figure 4).

A further problem is the disambiguation between the different meanings of search terms. For example, it may not be clear, if the term “painter” is meant to refer to an artist painter or a builder or house painter. Clarification of the semantics of search terms will be discussed later in Chapter 4. Examples such as the one given above and the described limitations motivated the development of semantic search.

2.2 Semantic web / Semantic Search

Before explaining semantic search, we give a brief introduction into the semantic web and related technologies.

(13)

7

2.2.1 Semantic Web / Document structure

The Semantic Web is the extension of the WWW (World Wide Web) which allows users to share and combine data across the boundaries of websites and applications. Based on semantic web technologies, users are enabled to create data and share them on the Web, build vocabularies, and write rules for handling data (W3C, 2013). The Semantic Web allows to specify the type of data that is stored, such as naming places and people. These data descriptions are implemented by technologies, such as RDF, SPARQL and OWL. Instead of mere text-based documents, the semantic web provides semi-structured information as explained below.

RDF (Resource Description Framework)

RDF is a standard model for semantic web, which is used as one scheme for description of concepts or modeling of information. It extends the linking structure of the web by using URIs to name the relationship between things and the two ends of the links, which is usually referred to as a “tripe”. (W3C semantic web, 2014). W3C standard RDF serialization formant includes N-Triples, XML/RDF, Turtle, etc. The N-Triples and XML/RDF are explained in detail in the following. Figure 7 shows an RDF structure representing information about the Da Vinci who is a painter and mathematician.

Figure 7: Displaying the data stored in RDF for example “Da Vinci”.

N-Triple is a simple and line-based format for serializing RDF. A line in N-Triples is a sequence

of RDF terms which represents the subject, predicate and object of an RDF Triple (see Figure 8(a)). Figure 8(b) is the N-Triples format example for “Da Vinci” as a graph structure.

(14)

8 Figure 8(a): Structure of N-Triples.

Figure 8(b): Displaying the data stored in RDF (N-triples) for example “Da Vinci”.

RDF/XML is an XML-based syntax for RDF. RDF graph contains nodes and arcs which are

represented as a set of RDF Triple (subject node, predicate, and object node). The nodes and predicate arcs are represented in XML terms — element names, attribute names, element contents and attribute values (W3C Recommendation, 2006). As can be seen, RDF/XML is a very verbose syntax that is difficult to read. Figure 9 is the same structure in XML/RDF format for example “Da Vinci”.

(15)

9 OWL (Ontology Web Language)

An ontology formally represents knowledge as a hierarchy of concepts within a domain. Ontologies use shared vocabularies to denote the types, properties and interrelationships of those concepts (Thomas, 1993). It is a structural framework to organize information and are be used in semantic web.

OWL is a semantic web language, which is a family of knowledge representation languages. OWL is a computational logic-based language, by which that knowledge can be exploited by computer programs (W3C Semantic Web, 2014). Like RDF, it can be serialized in RDF/XML syntax as well. Compared with RDF, it allows users to specify more about the properties and classes and can be expressed in triples. For example, it can indicate that "If A isMarriedTo B" then this implies "B isMarriedTo A".

The OWL language family support many of the syntaxes. Here are the examples of OWL using OWL2 functional syntax and Manchester syntax to represent an equivalent class musician, which is equivalent to a person who compose music or perform music. Figure 10 shows the OWL 2 functional Syntax for the example.

Figure 10: An example of the OWL Functional Syntax used to represent a class description. Manchester Syntax is created to provide non-logicians with a syntax that makes it easier to

write ontologies (Horridge et al., 2006). The Manchester Syntax is a compact syntax which style is close to frame languages.It can be used as an entailment-based querying language and helps users to construct and submit queries to ontology document (Koutsomitropoulos et al., 2011). An example using Manchester syntax is shown in Figure 11.

Figure11: Manchester OWL Syntax used to represent a class description.

Class: People

(16)

10

2.2.2 Semantic Search

Semantic search is an application of semantic web which is used to do the search (Guha, McCool & Miller, 2003).It aims at improving the performance of traditional search results by analyzing the contextual meaning of keywords and understanding the user’s information need. Semantic search uses the semantic query language (e.g., SPARQL) to find the target documents which has been semantically encoded by using specific technologies (e.g., RDF). Because input query terms follow a strongly defined semantic language query, the matched results are identified according to the semantic web data.

As the search takes into consideration the contextual meaning of keywords, the efficiency and accuracy of searching results can be substantially improved. Also, semantic search may take location, intent, variation, synonyms, generalized and specialized queries, concept matching and natural language queries into consideration while searching result (John, 2012). Currently, search engines such as Google and Bing, have taken some of these elements into their search algorithm (Cambridge Semantics, 2014).

SPARQL

SPARQL is one query language for RDF documents. It works on RDF documents that have been encoded in a database. It can retrieve and manipulate data that are stored in RDF format, by which the target documents are able to be found. For example, the SQARQL query (shown as Figure 12(b)) below returns the name of a person who has skill as painting and mathematics (shown as Figure 12(c)) from given information (shown as Figure 12(a)).

Figure 12(a): Figure displays the data of given information for the example.

(17)

11 Figure 12(c): Figure displays the result for the example.

We have discussed the difference between the traditional search and semantic search in Sections 2.1 and 2.2. However, the syntax and query language for semantic search are very complicated, especially for those users who do not have background knowledge of semantic web.

2.3 Summary

In this section, we discuss the differences between traditional keyword-based search and semantic search. Traditional search has severe limitations, which may be addressed through semantic search. However, semantic search is so complicated for users who do not have ontology or knowledge base background to understand and manipulate.

(18)

12

3. Related work on semantic search interfaces

Much research has been done on user interfaces for web search, and a number of related projects have explored semantic web search interfaces. However, most of these semantic search systems are more or less complicated, especially in the terms of creating queries. Some of them even require professional semantic knowledge or ontology background.

This chapter illustrates a number of approaches for semantic search interfaces. The following aspects are explored to compare these systems and interfaces:

 Query interface

 Query interpretation

 Result interface

3.1 Broccoli: A User Interface for Semantic Full Text Search

This project presents a user interface for semantic full-text search that addresses the problem of user support for complex semantic queries (Bast, Bäurle & Buchhold, 2012). The interface allows users to find information by using semantic queries incrementally even if they do not have any knowledge about the underlying ontology. The four main elements of this user interface (Figure 13) are the (1) input field, (2) proposal boxes, (3) query panel and (4) hits area.

Figure 13: The interface of full-text semantic search for an example query (Bast et al., 2012).

Query Interface

The user types what they want to search in the single input field (element 1 in Figure 13, and larger in Figure 14, top). The possible selection options for words, classes, instances (entity)

1

2 3

(19)

13

and relations are shown in the corresponding proposal box (element 2 in Figure 13, and larger in Figure 14 bottom).

Figure 14: As soon as “anim” is entered, the interface updates by itself and displays

appropriate proposals (Bast et al., 2012).

Query interpretation

For example, as the user types “anim”, the proposal boxes automatically update their content according to the entered prefix (Figure 14). By clicking proposal ‘Animal’ in class box, user adds it as the root node of query (only class and instance nodes could be defined as root nodes). Similarly, “occurs-with” in the relation box and word “plant” are added to query as well.

In this way, a search query like a tree is built incrementally in query panel (Figure 15). Single nodes of the tree are the same color as the related proposal box, and they can be easily added, deleted or replaced.

Figure 15: The query panel shows a possible formulation of the example query “Animals with

(20)

14

Result Interface

The hits for the search query are displayed automatically in the hits area (element 4 in Figure 13, and larger in Figure 16). Hits are referred to the root node of the query tree and grouped by the entity. For example, Figure 16 shows the hits that are grouped by entity “Cat”. Every hit has an excerpt as evidence, in which all words that matched the query nodes are highlighted, see Figure 16 (Bast et al., 2012).

Figure 16: One hit group for example query “Animals with tails and are breed of mammal”

(Bast et al., 2012).

Analysis

Even though Broccoli supports users in incrementally creating a query, in the end the query that is sent to the system assumes the existence of an ontology. Furthermore, the interface is still quite complex. For unprofessional users, it is complicated for them to formulate the query by using this interface.

3.2 A structured semantic query interface

This project shows a structured semantic query interface to help users to build and submit entailment-based queries to web ontology documents (Koutsomitropoulos, Domenech & Solomou, 2011).

Query Interface

(21)

15

accurate queries. It does this by separating their intended query into several atoms (Figure 17) which will be combined and transformed to expressions that is based on Manchester Syntax.

Figure 17: Figure displays three main atoms for building Manchester Syntax based query

expression (Koutsomitropoulos et al., 2011).

When the semantic search interface is loaded, there are three separate tabs: Search (default),

Advanced topics (inactive) and Options (Figure 18).The elements that are required to build a

query in Manchester Syntax are all in the Search tab. The Advances topics tab are not available currently. User can change the ontology and reasoners in the Options tab (Figure 19).

(22)

16 Figure 19: Options tab in semantic search interface (Koutsomitropoulos et al., 2011).

1. Search for is related to the outmost first (left) atom of a Manchester syntax expression (Figure 17). It provides two groups of suggested values: Types (for classes) and Relations (for properties) (element 1 in Figure 18, larger in Figure 20).Additionally, the fields Restriction and Expression are inactive unless a property is selected.

Figure 20:“Search for” can be either a property name or class name (Koutsomitropoulos et

al., 2011).

2. Restrictions correspond to the middle atom of the expression (element 2 in Figure 18, larger in Figure 21). It contains respective Manchester Syntax keywords.

Figure 21: Options for restriction (Koutsomitropoulos et al., 2011).

(23)

17

a free-text field (element 3 in Figure 18, larger in Figure 22 top).

Figure 22: An auto complete facility is provided for class names (Koutsomitropoulos et

al., 2011).

4. Condition is set for users to express the type of logical connection: and&or (Element 4 in Figure 18).

5. Generated Query is the field that displays the query expression incrementally. Users can type directly in this input box (Element 5 in Figure 18, larger in Figure 22 bottom).

6. Add Term is the button that adds current completed query expression to Generated Query field (Element 6 in Figure 18).

7. Search is the button that starts evaluate the query expression. The result will be showed in the Generated Query field (Element 7 in Figure 18).

8. Clear query helps users to clear the search form and to be ready for the next search (Element 9 in Figure 18).

Query Interpretation

For example, the user wants to search all entities that have a type of LearningResourceType. As showed in Figure 20, “type” is selected in Search for field. In

Restriction field, the user chooses “some”. When user types “L” in Expression field, the class

names are provided automatically. Finally, the query expression dcterms:type some lom:LearningResourceTypeiscreated.

Similarly, by clicking the “and” in Condition field, the more accurate requirement dcterms:type value lom:Slide is added to query expression (Figure 23).

(24)

18 Figure 23: Combing query expressions by using Condition “and” (Koutsomitropoulos et al.,

2011).

Result Interface

No information is given about the way that the results are presented in this interface.

Analysis

Similarly to Broccoli, this interface is very complex and requires the users have the background of ontology and semantic languages. This interface would be appropriate for technical experts but not for scholars that are not well experienced with semantic search technology.

3.3 Microsearch: An Interface for Semantic Search

The project proposes a semantic search prototype called Microsearch (Mika, 2008). In order to solve the problems of sparsity and relatively low quality of embedded metadata in semantic search, embedded metadata is suggested to be brought to the surface of the web and be incorporated in the display of result.

Query Interface

The query interface of this semantic search is very simple, just a single input box (Figure 24). For example, “Ivan Herman” is entered in search box.

(25)

19

Query Interpretation

There is no query interpretation explained in the article.

Figure 24: the query interface of Microsearch (Mika, 2008).

Result Interface

In the result page, there is a summary of metadata that is displayed as a part of abstract (Figure 25). Therefore, users can take further actions based on the semantics of information, such as sending email. For example, user are able to email Ivan Herman by the provided email address “mailto:ivan@w3.org”. Also, the pages can be related through metadata and be visually grouped together (Figure 26). What’s more, Microsearch also demonstrates the promise of semantic search when it comes to the aggregation of information across result pages, such as geographic relevance and timeline (Figure 27).

Figure 25: Metabox shows aggregated metadata (Mika, 2008).

(26)

20 Figure 27: Interface for result display: “Ivan Herman” (Mika, 2008).

Analysis

Different from the last two related works which helps users to build the semantic query, the query interface of this prototype is quite easy to use, in which just types the keyword and search it. However, the result page contains too much information and looks a little messy. Also, it does not provide users a chance to refine their query.

3.4 User interaction with novel web search interfaces

The article compares the performances of three available web search engine interfaces: Carrot2 (C)5, Middlespot (M)6 and Nexplore(N)7 (Ali et al., 2009). In the study, 35 participants

are given three navigational search topics and asked to find specific resources by using all three interfaces. The analysis shows that search completion time varies substantially in these interfaces. The mixture of textual and visual information leads to shortest completion time and least number of wrong answers.

5 http://www.carrot2.org 6 http://www.middlespot.com 7 http://www.nexplore.com

(27)

21

Query Interface:

There is no specific introductions for query interfaces of the three interfaces in this article. From the screenshots in the article, the query interface is similar to the regular search engine, such as Google.

Query Interpretation

In this article, it does not present how these three search engines interpret the input queries.

Results Interface

Carrot2 (see Figure 28 for interface) result interface display the search results in traditional

method, which contains title, snippet and URL (part 2 in Figure 28). Additionally, it clusters the search result (part 1 in Figure 28), which is displayed in two ways: a hierarchical tree structure (see Figure 29) and the visualization (see Figure 30).

Figure 28: Figure displays the interface for Carrot2 (Ali et al., 2009).

(28)

22 Figure 30: The results of example “carrot” are clustered in visualization.

Middlespot (see Figure 31) result interface contains two areas as well: text area (title, snippet

and URL) (part 1 in Figure 31) and visual area (screenshots of web pages) (part 2 in Figure 31). Particularly, the proportion of visual area is about 70%. When the mouse is hover on a specific screenshot, the corresponding image is enlarged (see part 3 in Figure 31).

Figure 31: Interface for Middlespot (Ali et al., 2009).

Nexplore (see Figure 32) result interface is divided into 4 areas: suggested queries, thumbnails

of retrieved documents, traditional search results for retrieved documents (title, snippet and URL) and sponsored links.

(29)

23 Figure 32: Interface for Nexplore (Ali et al., 2009).

Analysis

The result interfaces of the three search engines have some commons. Each of the result interface includes the traditional display way (title, snippet and URL). However, they all provide some special features separately. In Carror2, the hierarchical tree structure is very clear for users to select the specific category before browsing all the result documents. In Middlespot, it provides the screenshots for the web pages. The user-driven enlarging screenshot is a very special feature. In Nexplore, it is able to provide users the suggested queries, which is lacked in another two search engines. Also, it contains more result areas than Carrot2 and Middlespot.

3.5 SemSearch: A Search Engine for the SemanticWeb

The search engine SemSearch (Lei, Uren & Motta, 2006) aims to hide the complexity of semantic search from the end user and make it easy to use. It offers user the Google-like query interface which supports the multiple keywords.

Query Interface

The SemSearch query interface contains two parts: subject and keywords. The subject shows the expected type of search results. Operator: is used to capture the subject. And and Or are used to combine different keywords. Different from some semantic-based keyword search that only accept one keyword, this query interface supports the specification of multi keywords. The query in SemSearch looks like “Subject: keyoword1 or/and keyword2…”. For example, user inputs query “news: PhD students” if he wants to search the news about PhD students.

(30)

24

Query Interpretation

The search engine tries to find semantic entities matches for keyword. For example, user want to search news about PhD students. The result is supposed to be news entity in which PhD students are contained.

Result Interface

The search results are self-explanatory. For example, the related PhD student, the relations between student and news entity must be retrieved (Figure 33, bottom). Also, in each search result, the matched subject and matched keyword are displayed after title (Figure 33, top).

Figure 33: A screenshot of the search results of the query example news:phd students (Lei et

al., 2006).

Analysis

Compared with the first two interfaces introduced in this section, this user interface is much easier for ordinary users to use even they do not have any background knowledge. However, the creating of query and the contents in result page are not easy to understand for users who is not familiar with semantics and ontologies.

3.6 A Goal-Oriented Web Browser

The project introduces a web browser that is based on the knowledge base of semantic information, is able to provide users with a goal-oriented web browsing experience. (Faaborg & Lieberman, 2006). The Web Browser contains two parts that work together: Creo and Miro. Creo is a programming by Example system for the web that allows users to train their web browsers to interact with their favorite sites.

(31)

25

Miro is a Data Detector that turns plain text to a form of hypertext.

We now discuss the Miro interface. The interface has a toolbar for the Internet Explorer to match sematic content of a page to a high-level goal. For example. User is viewing the recipe for Blueberry Pudding Cake. The browser notices a pattern of foods on this page and provides user with two options: 1) Order Food (How to create this action is mentioned above) and 2) Nutritional info. If user click on “Order Food”, each food in the recipe will be converted into a hyperlink for that food at the user’s favorite online grocery store. Also, users can browse the nutritional info about each of the foods in their favorite sites.

Query Interface

There is no query interface introduced in this paper. The Miro reads the normal web pages and create a toolbar for it.

Query Interpretation

Creo allows users to train their Web browser to interact with a page by demonstrating how to complete the task. For example, the user click the Start recording button, and it starts to capture the user’s action of navigating to FreshDirect.com (Figure 34). Then, the user starts search “Diet Cook”. Clicking on the Ask->Food brand link (Figure 35) and clicking on the Scan tab. Use can see the page showed in Figure 36. By checking and un-checking items, users can directly control Creo’s generalizations. Here, user generalizes the concept to “Food Brand”. Finally, the user can finish recording and name this action as “Order Food”. By using the method introduced above, the user’s favorite online store are recorded in example of “Blueberry”.

According to the large knowledge base of ConceptNet and TAP knowledge base, the data detector Miro enables to recognize the useful words in text and convert the text into form hypertext. The hypertext of nutritional information in “Blueberry” example is created from the knowledge base.

(32)

26

(Faaborg et al., 2006).

Figure 35: Creo automatically generalizes the user's input (Faaborg et al., 2006).

Figure 36: The user can control which generalizations are active with check boxes (Faaborg et

al., 2006).

Results Interface

The result interface is displayed as Figure 37 which is similar to the normal web page. The only difference of this web page from normal web page is the auto created toolbar. This toolbar indicates the user’s potential search goal.

(33)

27 Figure 37: Automatic association of a user's high-level goals with the content of a Web page

(Faaborg et al., 2006).

Analysis

This is not a keyword or query based search interface as we discussed before. This project detects and analyzes the users’ potential intent automatically when user is browsing the web page. Users need to pre-record the actions at first. Also, the process of creating an action is kind of complex for the users.

3.7 Summary

The table below (Table 1) represents the aggregate data among the related works based on the criteria. Most of the systems use an underlying ontology, rather than knowledge base (only Web Browser uses the knowledge base). The user interfaces in related works mostly contains both query interface and result interface.

In most of the query interfaces, they are very complicated to understand and use, even they are claimed to be simple for end-users. Some query interfaces assume that the user is technically experienced and has prior knowledge about semantic web technology (e.g., Structured Interface). The goal-oriented web search browser even needs users to record the actions beforehand. The recording process is complicated as well.

(34)

28

the traditional result interface. Therefore, it is simple and clear to read the search results. Although, some systems in related works (e.g., structured interface) are designed for non-professional users and the purpose of them is aimed to reduce the complexity of the semantic search, but the difficulty of building the query still makes the whole searching processing complicated. For the system which does not need users to build a query, the quality of searching result is not very satisfied. The three result interfaces introduced on Section 3.4 brings up some new idea of displaying results.

Table 1: + means the requirement is met, an empty cell means the requirement is not met.

(35)

29

4. Semantic-Enhanced Search & user-driven Disambiguation

Faced with the shortcomings of the two search approaches described in Chapter 2 (keyword based and semantic search), we describe now the concept of semantic-enriched search. This third concept learns from the advantages of both earlier approaches. As in keyword search, the user only needs to input keywords. The keywords are then identified to related concepts by user-guided disambiguation. Then, these concepts are searched by using concept index. According to the matched concepts, the selected documents are displayed to the users. This search approach is very simple for users and augment the efficiency of searching results as well.

This chapter introduces the concept of semantic-enhanced search in Section 4.1. It then introduces the prototype system Capisco in Section 4.2, and the concepts of user-guided disambiguation (Section4.3). Section 4.4 discusses related work for disambiguation and Section 4.5 gives an overview of the concept of user-driven disambiguation.

4.1 Concept of semantic-enhanced search

Like traditional keyword-based search, semantic-enriched search proposed in this thesis uses text-based documents rather than semantic web documents in which data are stored as RDF/XML. Also, the target documents are text-based as well. The main idea of semantic-enhanced search is to analyze the documents for keywords indicating concepts, which are then build into concept index (Hinze et al., 2015). Afterwards, the users’ keywords are identified into concepts which are used to find matching document by the built concept index (see Figure 38). Thus this approach acknowledges that most documents available online are still lacking semantic markup, and that most users are not familiar with semantic query languages.

Figure 38: The figure displays the text-based keyword and target document of

semantic-enhanced search (Hinze et al., 2015).

(36)

30

document (see example in Figure 39). Each document refers to a number of concepts, which may be used for the search. To recognize the concepts contained in each document, an ontology or knowledge base, as used in the semantic web, would be employed. Then, instead of the keywords, the search can be performed based on these concepts.

Figure 39: The example of displaying the list of extracted concepts from a document.

Secondly, the concept index is required to be built. A similar way to that which was used to build the keyword-based index (introduced in Section 2.1) can be used here to build a concept index. The concepts that extracted from the document are listed and the pointers are created to point to each of them (see Figure 40). Each concept that occurred in the document is listed in the index with pointers to its occurrence. The concept index contains all the concepts from the documents and also pointers to each of the documents containing a concept.

(37)

31

The last step is deal with the search keyword. According to the input keywords, the corresponding concepts are identified and added to the query (i.e., as query expansion) or used instead of the original keyword-based query (i.e. query replacement). The act of identifying corresponding concepts for each keyword is called disambiguation (Zhang, Deng & Li, 2009). Then, these concepts are used to identify the matching documents by using concept index.

4.2 Capisco Prototype System

In this section, we describe the prototype of a semantic-enhanced search system called Capisco. Before describing the system structure, the important component of Capisco Concept-in-Context network are introduced at first (in Section 4.2.1). The structure of the Capisco system are explained then in detail in this Section 4.2.2.

4.2.1. Concept-in-Context network (CiC)

As explained in Section 4.1, the semantic-enriched search systems use an ontology or knowledge base to identify matching concepts. In our semantic-enriched search system Capisco, the semantic concepts and their relationships are identified by the Concept-in-Context (CiC) network, which plays the crucial role in Capisco. In Concept-in-Concept-in-Context network, the initial knowledge base is built by analyzing Wikipedia articles. Wikipedia is the largest and fastest growing encyclopedia currently, which contains two million articles and thousands of contributors. As it is a massive repository for available knowledge, it is considered as an excellent information resources to integrate the Wikipedia knowledge to this system.

Unlike some other systems, i.e. based on the Wikipedia category hierarchy, the approach in this system is to use the concepts extracted from the corresponding Wikipedia articles, and the relationships between symbols and their meanings which are extracted from Wikipedia links. In each article from Wikipedia, the concepts are extracted automatically by the machine which contains the links that the corresponding article is linked to or those are linked to this article. Each article contains several or even substantial numbers of concepts. Some articles may have one or more concepts in common.

For example, as we can see from Figure 41, both of “Dog” and “Cat” contain so many links to different concepts and some concepts are overlapped which means they are related. However, the article contains a lot of irrelevant concepts as well, which indicates that are not related. So, it is obvious that sometimes it is difficult to identify those meaningful links from irrelevant

(38)

32

links, which contains many irrelevant results.

Figure 41: Comparing the article about dog with another article about cat and measure the

relatedness between them (Milne & Witten, 2013).

Figure 42: Disambiguation of text and articles.

In different articles, the different texts are likely to link to the same article. However, sometimes the same text in different articles are linked to totally different articles (Figure 42). In this condition, the concepts need be disambiguated. We can use the article the text is belonged to as the context to help fulfill disambiguation and get the disambiguated concept.

(39)

33

4.2.2 Structure of Capisco system

Figure 43 presents the conceptual structure of the Capisco system. The structure contains three main components: 1) the Wikipedia seeding, 2) semantic analysis and indexing, and 3) the semantic-enhanced search. In this section, these three components of the structure are described in detail.

Figure 43: Figure displays the conceptual structure of Capisco system,

adapted from (Hinze et al., 2015).

CiC network seeding

As explained above, the relationships between symbols (the textual value assigned to a link included in a Wikipedia article) and their meanings are extracted from the links between Wikipedia articles (see Figure 44). In each Wikipedia article, it has a unique ID. Every ID points to the title of the article, which is regarded in Capisco as a concept. In regards to the links between the articles, the starting side for the link is IDC (id for the context article, which means the article uses this article) and the destination side is IDM (id for the meaning article, which means the article uses the concept as name. Thus, the link is considered as the triple of symbol, IDC and IDM. IDC is the start point of the link. IDM is the destination of the link and symbol is the textual value used for the link.

(40)

34

Semantic analysis and indexing

The concepts derived from CiC network are used to do the semantic analysis for the documents in the digital library. In the given text, the terms that appear together are considered as related, so one term is able to be disambiguated by their related terms. In reality, these terms are might be ambiguous themselves, which refer to many concepts in CiC network. However, some of them might be unambiguous in the context when they only have one meaning in the given context. Therefore, these unambiguous terms are treated as anchor pointers for disambiguation, by which the documents are semantically analyzed. The concept index is built by the semantic analysis, which handling the links between the documents in the digital library and the concepts between the CiC network.

Semantic-enhanced search

With the help of the Capisco, the text-based search interface is provided to the users. According to the terms that entered by the users, the mapping concepts are automatically listed to the users, which leads to the user-guided disambiguation (see Section 4.5). Based on the selection result from the users, the matching documents are returned to the users.

4.2.3 System access and functionality

In order to access to the Capisco system and manipulate the search process, there are several context commands of manipulating data in Capisco are required to be introduced.

The context command (Context (symbol) ->IDC) will return all IDCs that a symbol is used in article (see Figure 45). For example, the symbol 'english' is used in the articles 'Language' and 'United Kingdom', so the context command would return the IDs for at least these two articles. This command can be used to learn in which context a symbol is used.

(41)

35

Similarly, the context command (Senses (symbol) ->IDM) will return all IDMs that a symbol is linking to. For example, the symbol 'cancer' might return the IDMs for the animal cancer, the constellation cancer, the astrological sign cancer and so on. This command can be used to learn what different meanings a word can have.

According to the given meanings of the word, the context command (Artid(name) -> ID) will return the ID for a given article title. On the contrary, the context command (Artname(id) ->name) will return the title for a given article ID.

4.2.4 Search Processing

From the CiC network, all potential meanings of the keyword can be retrieved. In this section, an example is used to demonstrate how the Capisco system support the semantic search. We use “Kiwi” as the given keyword for the example.

The first step is to extract the concept of keywords that would be utilized as concept index (symbol). Each term is likely be related to multiple concepts (polysemy). From the CiC network, all the relevant concepts of keyword can be identified. Thus, the keyword “Kiwi” is found to be used to refer to Kiwi people (New Zealander), Kiwi bird (New Zealand national bird), Kiwi fruit and another 11 concepts (Figure 46).

Figure 46: The keyword “Kiwi contains 14 concepts by the system search results”.

We face another problem for disambiguation, which is synonymy, in which a concept is known by multiple names. For example, the word kiwi (bird) is referred to by words such as “Kiwi”, “kiwi”, “kiwi bird” or “Kiwi bird” (Figure 47). Thus, no matter “kiwi bird” or “Kiwi bird” should be linked to the same article.

(42)

36 Figure 47: The words that the concept “kiwi (bird)” are referred.

Although the Capisco CiC allows the lookup of concepts for a keyword, the actual intent of the user is still ambiguous at this stage. Therefore, a further refinement from user is necessary. With participation from users, we can move from the possible concepts to the intended ones. The next subsection explores the requirements of the process.

4.3 Requirements on user-guided disambiguation

From the example above (Section 4.2.4), it is obvious that the same symbol might refer to different concepts (Kiwi refers to kiwi bird, New Zealander and so on). In many cases, the symbol refers to a number of concepts. In this condition, the way to distinguish these different concepts is to refine the user’s information need. The user-guided disambiguation is an effective way to understand the user’s intent and provide users with more accurate search results.

In semantic search, the input query terms need to follow a strongly defined semantic query language such as SPAROL. Also the documents are stored as semantic web data, such as RDF. Thus, the semantic search engine are able to supply accurate search results by analyzing the semantic query language. However, in the Capisco, the semantic-enhanced search deals with

(43)

37

the given keyword. In order to get accurate concepts of the keyword, the further disambiguation from the users is required to identify the intent meaning of keyword.

AS explained in Section 4.2, the Capisco system is able to identify the related concepts for a given input keyword. These possible meanings for a keyword are displayed to a user to support the manual semantic disambiguation. The user-driven disambiguation helps constrain the possibilities of the target concepts and augment the efficiency of semantic-enhanced search. The user-guided disambiguation is described in detail in Section 4.5 after first discussing related approaches to disambiguation.

4.4 Related work on disambiguation

There are several approaches to disambiguation. In this section, we introduce and analyze the disambiguation approaches in the following related works according to our requirements. Our analysis concentrates on the process, the performance and any involvement of users in the approach.

4.4.1 Concept based query expansion using WordNet

Concept based query expansion using WordNet is a query expansion technique (Zhang et al., 2009). It utilizes WordNet, which is a large manually-created thesaurus, and its semantic relatedness module. Zhang et al. observe that in a query sentence, the meaning of each word is not only determined by its own definition, but also affected by the context of the whole query sentence (based on the work of Hoeber & Xue, (2007)). They pre-process sentence queries as to split those sentences with punctuation and other elements, into separate queries (so-called “clean” sentences).

Disambiguation Process

They process each of the queries through the WordNet::SenceRelate module. For each term in the query, all synonymous concepts are identified in WordNet, and stored into a synset. All synsets related to a query are combined into a 2-dimenional concept array. This array is then used for query expansion in the following way: New queries are formed by selecting a synonymous term for each of the original terms (i.e., a query is a combination of terms in which each terms is from a different synset). Instead of using all permutations of terms, suitable queries are selected using the following process: For each of the newly created candidate queries, another process of synonym identification is started and another concept array is identified. Then the original concept array and the newly created concept arrays are

(44)

38

compared. Only candidate queries that have the same concept array as the original query are accepted as valid query expansions.

Disambiguation Performance

Zhang et al. conducted experiments in which the top 10 results of the original queries were compared with those of their valid query expansion queries (using Google). Shorter queries were excluded from the experiment as it was found that this query expansion does not work on short queries. The authors report an improved query precision of 7% between expanded and original queries, judged by human participants. The authors note the slow performance of the disambiguation process.

User Disambiguation

There is no user disambiguation introduced in this article.

Analysis

Zhang et al. follow a similar approach to our project in identifying synonyms from a given knowledge base (WordNet). Their approach then uses an automatic second step to identify appropriate synonyms. Their approach does not directly involve users and their information need. Furthermore, as they observed, their method is not appropriate for short queries (no information about the number of query terms was given). It also appears as if Zhang et al. assumed that queries would be sentences, not collections of keywords; however, the implications of this assumption are unclear. The results achieved seem poor for an approach that has slow performance.

4.4.2 SemSearch

SemSearch is a search engine designed by Lei, Uren and Motta (Lei et al., 2006) which aims to decrease the complexity of semantic search and enable the ordinary users without knowledge of ontology can manipulate it as well. Its user query contains two part: the queried subject which indicates the type of intent results and the combination of keywords.

Disambiguation Processing

In the process of searching, the first step in SemSearchis is to figure out the semantic meaning of the user queries. The Text Search Layer in the engine finds out the explicit semantic meaning of the keywords. In this way, in order to find out all the matched semantic entities of each keyword, Lei et al use labels of semantic entities as the main search source, which represents the understandable meaning of the semantic entities. All the semantic entities in the data repository are indexed firstly, and then the engine searches the indexed repository and get all the matched entities. These matched semantic entities are the potential meaningful

(45)

39

candidates for keywords.

Next, the engine translates user queries into formal queries by classifying entered queries into two types: simple queries (contains two keywords) and complex queries (contains more than two keywords). In simple queries, Lei et al. build a set of templates to help retrieve the relations between two entities of semantic entity combination. Also, the relation between entities are needed for searching result. According to the corresponding combination of matched entities, the template is to be initialized as the formal query. The keyword is likely to match more than one semantic entity, thus there are more than one query needs to be constructed. Specifically, if the subject keyword matches N semantic entities and the other keyword has K matches, therefore totally there are N*K queries that need to be constructed. For complex queries, the semantic entities of each matched keyword are needed to be combined together and then construct queries for each of the combinations. In some cases, the

Disambiguation Performance

Lei et al note that, in reality, the number of the combination for complex query could be huge and takes quite long time to get the result. Hence, Lei et al just use the simple query to find the matches for keyword in order to get quick result.

User Disambiguation

There is no user disambiguation involved in this article.

Analysis

It is not clear how long it would take to get the results for complex quires if a large number of of queries need to be constructed. Also, there is no experiment section in this article, so we are not sure of the performance of this engine in terms of simple quires and complex queries. Moreover, there is no user participant involved.

4.4.3 Librarian Agent Query Refinement

Librarian Agent Query Refinement (Stojanovic, Studer & Stojanovic, 2004) is a comprehensive approach for ontology-based queries refinement. It adjusts a query to the demands of the user step by step and includes the interaction with the users as well by analyzing the user behavior. The list of results by using this approach is able to decrease the irrelevant results and increase the relevance of the results.

Disambiguation Process

(46)

40

and their queries are always not matched, which is defined as the query ambiguity. The ambiguity of the query is defined to two types: the semantic ambiguity and the content-related ambiguity. In terms of semantic ambiguity, the specific formula is applied to calculate the query and output the query variable that causes the highest ambiguity in the query. This discovered variable will be refined in the next step pf progress. In content-related ambiguity, the results of the query can be used to discovery the potential ambiguity. For example, if two queries return two same results, these two queries are defined as equivalent queries. Therefore, a list of equivalent queries are treated as the indicators of content-related ambiguity. In this way, the content-related ambiguity can be presented by comparing the results of the queries.

Then, the meaning of the query needs to be found out according to users’ needs. Stojanovic

et al. use an approach called implicit relevance feedback to get the user’s preferences.

Specifically, the resource that the user selects from the retrieved results is treated as the correspondence of the user information need. Consequently, more relevant information are considered as the intent of current query. According to the given users’ information need, the ambiguities collected from last step can be interpreted.

Lastly, regarding the relevance for fulfilling user’s information need, the recommendations of query refinements are ranked to the users.

Disambiguation Performance

There is no user experiments introduced in this article, therefore, feedback on this query refinement is not available.

User Disambiguation

The query refinement method introduced in this article involves the user participation for disambiguation (see step 2 in disambiguation process). The users are able to select a resource from the retrieved results. The resource that are selected by the user, to some extent, refers to the users’ intent. By analyzing the resources that the user pick up. By analyzing the common attributes in the results which are selected by the users to view, the intent of the user are discovered.

Analysis

Stojanovic et al. (Stojanovic et al., 2004) do not include the experiment section in the article, so the actual performance of this new approach in query refinement is not clear so far. Also, there is no proper result lists displayed in the article, so we do not know what it looks like.

(47)

41

4.4.4 Summary of related work on disambiguation

The disambiguation approaches introduced in this section mainly use the query expansion, combination of semantic entities and query refinement respectively, to disambiguate the input user quires. In these three disambiguation approaches, the user queries are all translated into specific generated quires. By analyzing these constructed quires, the more accurate results/concepts are identified from the candidates. However, the performance of these disambiguation approaches are not very impressive and satisfied. Concept based query expansion only improves 7% of accuracy and the performances of another two disambiguation are not clear from the given articles.

Only in the Librarian Agent Query Refinement, the user refinement are involved to do the disambiguation. The users are provided with a list of refinements. According to the users’ preference, the ambiguities of queries are interpreted in a further step.

4.5 Concept of user-driven Disambiguation

The related work introduced in last section identifies one approach for refinement of user queries with the help of user feedback. As explained in Section 4.3, automatic disambiguation in Capisco is not suitable to be applied for keyword-based search as there is no context available. We suggest instead to use user-guided manual disambiguation. In this section, we explain our user-guided disambiguation approach in detail.

The principle of user-guided disambiguation is to make use of a searcher’s information need as the source to disambiguate the existing relevant concepts of the given keyword. We explained before how the concepts of a keyword are identified by the Capisco system’s CiC network. However, it is rare that the keyword belongs to only one concept. A keyword potentially refers to a number of concepts (that is, it is ambiguous). Thus, the user-driven disambiguation is the process to find out the correct concept according to user’s preference of the concepts and information need.

Using Capisco, a user could use the keyword “Kiwi” in a search. The information about possible senses can be retrieved from the CiC network in Capisco. This requires to lookup all senses for term “Kiwi” (using the commend “Senses Kiwi”, as shown in Figure 48). This then retrieves the

ids of the senses (Figure 49). The command “artname artid” (eg: artname 509080) then

retrieves the name of sense one by one.

(48)

42 Figure 49: The results of the optional senses in Capisco system.

To support this process, we design the user interface for the Capisco system. With the help of our search interface, the users without the semantic web or concept are allowed to manipulate the system visually. The searching steps are simplified and clear. The users type the keyword in the box directly, like any other search engine. The names of all the optional senses are displayed to the users, rather than the ID numbers.

Obviously, both automatic semantic disambiguation and user-driven disambiguation is complicated to manipulate by using the text commands, especially for the scholars that do not have professional background. Therefore, a simple and user-friendly interface for this system is necessary.

4.6 Summary

In this section, we present the concept of semantic-enhanced search which take advantages from traditional text-based search and semantic search. The Capisco system and its structure are explained in detail. According to the introduced functionality and search processing, the users who have Linux knowledge are able to do semantic-enhanced search effectively by using Capisco. However, our aim is to enable normal users without specific technologies to manipulate the semantic-enriched search. Thus, the user-friendly interface is needed to be designed for the end-users.

(49)

43

5 User-guided disambiguation: interfaces

In this chapter, we present the user-guided disambiguation strategy through three interface prototypes that we designed. The next section describes the general layout of the user interface in with the help of screenshots. In Section 5.2, the differences of three variations for user-disambiguation interface are introduced. The example of manipulating the search by using the interface is demonstrated in Section 5.3.

5.1 Interface for user-guided disambiguation

The users of the semantic-enhanced search are acknowledged as scholars that are not technically versed rather than experts who have technical knowledge of semantic web technology. Thus, one task is to create the user-friendly interface that the users are familiar which easy for them understand and use. Also, in order to increase the efficiency and accuracy of searching, apart from the system disambiguation, the interface of user-driven disambiguation is crucial as well.

Within the Capisco architecture, our disambiguation interface is part of the semantic-enhanced search component. The user interface and its interactions support users in disambiguating the search keyword step by step. In the end, the matched documents are displayed to the users according to the disambiguated keyword.

The three disambiguation interfaces offered to the users follow the same principle layout and the differences of the three variations are explained in detail later. Here, we introduce the general layout at first. Our semantic-enhanced search interface contains two main part: keyword inputting interface and the user-guided disambiguation interface.

Figure 50 and Figure 51 show the overview of the layout of the user interface. As the Capisco project was executed in collaboration with the HathiTrust, the interface uses the Hathitrust logo and name. The interface contains the following 4 main elements:

1) Search box (number 1 in Figure 50): the input box for the keyword. The users type the keyword that they want to search here. Then click the search icon, the system starts searching. This is query interface for the semantic-enhanced search.

2) Senses result area (number 2 in Figure 51): the list of all the optional senses. All the possible senses are displayed in this area. Users can go to next page and view other options by clicking the pager (number 3 in Figure 51). This is area where user-guided

References

Related documents

The following coping families and ways of coping upon hearing the news were observed based on the mothers’ reports in the interview (N = 17): (a) Self-Reliance (way of

Minors who do not have a valid driver’s license which allows them to operate a motorized vehicle in the state in which they reside will not be permitted to operate a motorized

Criteria for the Addition of Prone Imaging to Myocardial Perfusion SPECT for Inferior Wall.. Short Title: Additional method of the

An efficient approach, called augmented line sampling, is proposed to locally evaluate the failure probability function (FPF) in structural reliability-based design by using only

The main window of the "Raw Data Mode" is designed similar to "Test Mode" in LTP-Director: On the left side the blue control window, and on the right side the raw

The Peace Church Climate Justice Team is excited that our congregation voted unanimously to write a letter of support for the Duluth City Council declaring a Climate Emergency

specified thickness is ground until our standards of flatness and parallel alignment are obtained Spot facing = Max.. MDL GROUP FRANCE GERMANY BRAZIL