• No results found

Semantic Document Architecture - -SDArch

4.2 The SDArch Services

As I mentioned in the introduction of this chapter, the SDArch functionalities can be grouped in two groups: i) the functionalities that are related exclusively to semantic document management and ii) the functionalities that SDArch should share with other services of SSD. The first group of functionalities are implemented by the two SDArch services: 1) the semantic document authoring and 2) the semantic document search and navigation services. The second group of SDArch functionalities contains: func-tionalities necessary for managing domain ontologies, which are used for the semantic document annotation, linking and indexing, then functionalities necessary for manag-ing the SDArch user profile data, and functionalities necessary for managmanag-ing the SDArch social network data. The second group of functionalities are implemented by the three SDArch services: 3) the User profile management, 4) the social network management, and 5) the ontology management services.

My main focus regarding the SDArch design, in the scope of this thesis, was on the overall architecture design and the detailed design of the two SDArch services related to semantic document management (i.e., services 1 and 2). For a detailed description of the functionalities provided by these two services and the corresponding processes realized by them, I dedicate the next chapter (Chapter 5). In this chapter I give just a brief overview of these two services. The design of the other three services (i.e., services 3, 4, and 5) was dedicated mainly to those functionalities that interfere the semantic document management processes.

4.2.1 Semantic Document Authoring Service

The semantic document authoring service provides the functionalities necessary for the realization of the semantic document authoring process. Figure 4.2 shows a functional model of the service including the service’s functional modules, the module interdepen-dencies and the service’s interface.

The service’s interface provides two methods:

Transform() Takes an existing, conventional document (e.g., Word and Pow-erPoint) and transforms it to a semantic document.

Conceptualize() Takes a document unit and retrieves a set of ontological con-cepts whose instances appear in the document unit’s content.

53 4.2 The SDArch Services

Figure 4.2. Functional model of the semantic document authoring service

While theTransform()method is invoked by applications from the presentation layer, the Conceptualize() method is invoked by the semantic document search and nav-igation service in the process of generating the semantic query (Section 5.2.1). The service’s functionalities are implemented by five functional modules: SemanticDoc RD-Fizer, Annotation, Indexing, Linking, and Knowledge Extraction and Conceptualization modules. Each of the modules handles one or more internal processes that the semantic document authoring is composed of. The detailed description of the semantic document authoring process and the service’s functional modules are given in Chapter 5.

4.2.2 Semantic Document Search and Navigation Service

The semantic document search and navigation service provides the functionalities nec-essary for the realization of the semantic document search, the full-text search and the semantic document navigation processes. Figure 4.3 illustrates the functional model of the service. The service’s functionalities responsible for the semantic document search are implemented by two functional modules: the Search and the Search Personalization modules. The service’s functionalities responsible for the semantic document naviga-tion process are implemented by the Semantic Naviganaviga-tion module. Both of these two processes and their corresponding functional modules will be described in details in Chapter 5. In addition, the full-text search is provided as a complementary search to be used when the semantic document search does not retrieve any results. It is realized by the Search module.

Figure 4.3. Functional model of the semantic document search and navigation service

The service’s interface exposes two methods, which correspond to the two main pro-cesses that the service realizes:

SemanticSearch() Takes the initial free-text user query, transforms it into a se-mantic query (Section 5.2.1) and executes the sese-mantic query against the concept index of the SDArch semantic document repository.

TextSearch() Executes the initial free-text user query against the text index of the SDArch semantic document repository.

Navigate() Executes a navigational query (Section 5.2.3), which is gener-ated by the user clicking on a semantic link of a retrieved doc-ument unit by the semantic search, against the SDArch RDF repository.

4.2.3 User Profile Management Service

The SDArch users are described by the SDArch user profiles and uniquely identified by an OpenID1. I chose OpenId for the SDArch users identification, since it is an open, decentralized standard for the authentication of online users. OpenID can be used for

1http://openid.net/foundation/

55 4.2 The SDArch Services

access control, allowing users to log on to different services with the same digital iden-tity. The SDArch user profiles are specified by the SDArch user model. The same as the machine-processable representation of the semantic documents, the SDArch user profiles are represented by RDF data representation model and stored in the SDarch RDF repository. Being uniquely identified and described by machine-processable de-scriptions, the SDArch users are represented in accordance to the envisioned user rep-resentation of SSD and the Semantic Web.

The user profile management service provides functionalities necessary for manag-ing information kept in the user profile. Moreover, it serves the user information to the other SDArch services which need that information (e.g., the semantic document search service in the search personalization process). Finally, the service provides access to some of the user information to the other users from the same SDArch social network.

In what follows in this section, I first describe the SDArch user model, and then describe the functional model of the service including the service’s interface and the service’s functional modules.

The SDArch User Modeling

The SDArch user model is influenced by the notion of semantic documents as completely open resources, composed of easily accessible and reusable data/information units (i.e., document units). It intends to replace today’s application-centered user model[102, 79]

with a document-centered user model. In other words, the SDArch user should focus on authoring a document or performing an individual task, rather than using any particular application. Moreover, instead of authoring documents completely from scratch, SDArch encourages document authors to reuse existing, well-defined document data and poten-tially modify them to serve their purposes. Reuse of appropriate existing document data not only saves authors’ time, but also has the potential to improve the quality of the authored documents. In particular, if the author does not possess an adequate knowl-edge about the topic of a document to be authored, the reuse of data from documents created by the experts on that topic will lead to a better quality document. Contrary to conventional documents, where the reuse of document data is accompanied by the loss of the proprietary information, the reuse of data from semantic documents (in the form of semantic document units), preserves the proprietary information of the reused data.

Semantic document authoring is one of the main processes that the SDArch users are involved in. Moreover, the new document architecture prioritizes the reuse of exist-ing, well-defined document content over the creation of a new content while authoring new documents. Therefore, modeling user preferences regarding document contents to be reused was one of the main aspects that I was focused on while designing the SDArch user model. The user preferences are important for the personalization of the semantic document search. Based on the values of the user preferences, the document units retrieved by the semantic document search are re-ranked to better correspond to the user preferences. Similar to the semantic document model, the SDArch user model

is specified by an ontology, namely the SDArch user-model ontology. Figure 4.4 gives a simplified, graphical representation of the ontology. The full specification of all ontol-ogy’s classes and properties is given in Appendix A.5.

Figure 4.4. Illustration of the user model ontology

Two main classes in the ontology areumo:Userandumo:Preference. Theumo:User class is derived from thefoaf:Personclass of the friend-of-a-friend (FOAF) ontology2. The FOAF ontology contains classes and properties for describing people, links between them and things they create and do. As a subclass of the foaf:Person class, the umo:User class inherits properties that model a personal information of the SDArch user such as name, title, age and e-mail address. Moreover, the class inherits the foaf:knows property of the foaf:Person class, which is a property that was of par-ticular interest for this thesis. This property enables the SDArch users to be connected to other persons that they know. In addition to the properties inherited from the foaf:Personclass, theumo:Userclass provides a set of properties that are introduced in order to model some specific characteristics of the SDArch user. As mentioned above, each SDArch user is uniquely identified by an OpenID, which is kept as a value of the umo:hasOpenIDproperty. The propertyumo:isMemberOfholds the information about a social network group that the user is member of. A detailed discussion on social network

2http://xmlns.com/foaf/spec/

57 4.2 The SDArch Services

modeling, I give in Section 4.2.4. Moreover,umo:interestedIn,umo:isExpertInand umo:isComunityExpertIn are properties which hold the information about the topics that the user is interested in, the topics that the user self-asserts that he is an expert in, and the topics that are determined as a part of the user’s expertise based on the amount of his document contents which has been reused by other members from the same SDArch social network. Finally, theumo:Userclass has theumo:userPreference property, which holds instances of theumo:Preferenceclass that I describe next.

Theumo:Preferenceclass is a generic class introduced to specify the user’s prefer-ences regarding the choice of document units to be reused. The information modeled by this class plays one of the key roles in the personalization of the semantic document search discussed in Section 5.2.3. The class has the following properties: umo:hasId, umo:hasLabel, umo:hasImportance, umo:hasNumValue and umo:hasEnumValue. In-stances of the class, that is, user preferences are uniquely identified by the preference ID and the preference label. The value ofumo:hasImportanceproperty determines the importance of the preference for the user. Different preferences have different impor-tance for different users. The preference imporimpor-tance is specified by the user. Based on the nature of the preference, the preference value can be expressed by a numerical or enumerated (i.e., list of entities) value. Theumo:hasNumValueandumo:hasEnumValue properties hold the preference’s numerical and enumerated values respectively. In con-trast to the preference importance, the preference value (it is either numerical or enu-merated) is not specified by the user, but learned automatically over time by monitoring the user’s activities.

Preferences that I introduced in the user model are correlated with the set of infor-mation that can be extracted from the social-context annotations (Section 3.2.2). I now briefly describe each of them.

1. Preferred Authors (P r e f1): this preference specifies an ordered list of SDArch