• No results found

NOTATIONS AND BACKGROUND

Before delving into algorithmic details, this thesis provides the theoretical background and definitions that are relevant to this research and used in subsequent chapters. An overview of concept taxonomy, basic notation and the concept hierarchy model are out- lined in the next section.

3.2.1 An Overview of Concept Taxonomy

Item taxonomic information is based on a set of categories or topics that can be used to classify and describe items in a hierarchical structure, from coarse-grained classes to fine-grained classes. Item taxonomy is often described in product descriptions, which are provided by domain experts [46, 67] and designed to help users find their preferred items

Figure 3.1: Taxonomic information for item representation concepts (or categories)

or products easily and quickly. One of the main advantages of taxonomy is that cate- gory correlations within item taxonomies represent the hierarchical relationship between categories. There are also other advantages, including implicit feedback data, standard vocabularies and the fact that it is not vague [67]. In addition, a taxonomy’s hierarchical structure can also reflect users’ topics of interests, from general topics of interest at the root nodes to specific topics of interest at the leaf nodes [52]. This enables user interest in items to be linked with the taxonomic information of those items. In short, information on a user and his or her preferences can be learned from an item’s taxonomic information. Figure 3.1 illustrates an example of concept taxonomy. There are two users who gave ratings to the items. Each item can be described or classified with multiple de- scriptors, each containing a set of categories (or concepts) that form a path in the concept taxonomy (see the right side of Figure 3.1). In general, product categories can be naturally organised into hierarchies, where the root category of a hierarchy (e.g., a tree) is the most general and the categories become more specific towards the leaves.

In a tree structure, one branch tree may have the category name computer technology as the concept and the child categories of programming, database and web application

as sub-concept, and another branch may have the concept name business and the sub- concepts of marketing and management. To clearly show the proposed concept hier- archy approach, we assume that item represents a product and each concept represents a category. Additionally, the nodes in the concept taxonomy represent concepts (or product categories). The concepts start generally at the root node of the hierarchy and become more specific towards the leaves. Therefore, it is possible that the affinity of user preferences to one item can be linked by concept taxonomy to some of that item’s connections.

3.2.2 Basic Notation

To clearly show the proposed concept hierarchy, formal definitions of some other concepts and entities relating to item taxonomy are listed below:

• Users: U = {u1, u2, . . . , um} is a set of users, where ui ∈ U means the user ui

who has either browsed the item or contributed ratings of the item. The aim of the proposed recommender system is to create recommendations for a user ua, who we

call an active user or a new user.

• Items or products: B = {b1, b2, . . . , bn} is a set of items or products (e.g., books

and music tracks) that have already been rated by users ui ∈ U . Items bk ∈ B are

represented by descriptors.

• Explicit ratings: Rik denotes the users ui who express their opinion about items

bkvia ratings. Rik indicates the preference by user uiof item bk, where high values

mean stronger preferences. Users ui can express their preferences for items in

numeric form. That is, value 0 indicates a user’s dissatisfaction with the item and a value of 1 or greater indicates their satisfaction with the item. In this thesis, the

explicit rating Rikvalues between 1 and 10 are utilised to conduct the experiments.

• Item preferences: A user’s preferred items can be classified into two groups: explicit item preferences and implicit item preferences. Explicit item preferences are collected from users when they directly express how much they like an item on numeric number scale (i.e. explicit item rating). Implicit item preferences are automatically obtained from each user’s behaviour or navigation. This is usually represented by a set of binary numbers (0, 1) called implicit ratings. Implicit item preferences do not give a clear indication of a user tastes, opinions, or potential emotional involvement with items in the system. However, it is assumed that if a user clicks on an item, they have some kind of interest in it, even if they do not like it.

• Concept taxonomy: T is a pair (C, ‘is − a0), where C = {c

1, c2, . . . , cw} is a set of

concepts (or categories), and the concept correlations within item taxonomies are organised in a tree or hierarchical structure. The ‘is-a’ relationship represents the hierarchical relationship between concepts. For example, cx is-a cy (or cy > cx),

which means cy is a super-concept of cx, and cx is a sub-concept of cy. Typically,

concepts can express either broad (super, or general) categories or narrow (sub or specific) categories. The root concept node is the most general concept, and the concepts become more specific towards the leaf nodes within the concept taxonomy.

Figure 3.2: The concept correlations within the item taxonomy represent the hierarchical relationship between concepts

Figure 3.2 shows an example of concept correlations within an item taxonomy that represent the hierarchical relationship between concepts. Supposing that the book b2 in Figure 3.2 is associated with the three item taxonomic descriptors: d1 =

{c0, c1, c2, c4}, d2 = {c0, c1, c3, c5}, d3 = {c0, c1, c3, c6, c7}, the item b2 can be

described or classified by eight taxonomic concept c0, c1, c2, c3, c4, c5, c6 and c7.

Within the item taxonomy tree, the taxonomic concepts correlations of the given item b2, which can be described as book, is a root category or concept; also, book is

a super-concept of Computers&Technology. The one sub-concept of the leaf nodes is Java, which is the most specific concept in the item taxonomy tree.

• Item taxonomic descriptors: Dbk = {d1, d2, . . . , dv} is a set of item descriptors

where each descriptor is a sequence of concepts based on the concept taxonomy relation T . As shown in Figure 3.3, item bk can usually be described or classified

using item taxonomic descriptors. An item can be described with multiple descrip- tors. For example, book b2Java Programmingin Figure 3.3 has the following three

d1 = Books > Computers&Technology > Programming > Introductory

d2 = Books > Computers&Technology > Programming > Languages&Tools > Java

d3 = Books > Computers&Technology > Web&Design > Programming Languages

That is, Db2 = {d1, d2, d3}.

Figure 3.3: The example list of items with their taxonomic descriptors