Recently, as a new web content production circle, user-generated content has en- joyed an enormous growth. We have seen a shift among web content publishers from creating on-line content themselves to providing the facilities and play- grounds for end users to publish their self-produced content, such as bookmarks (del.icio.us), photographs (flickr.com), research papers (CiteULike.org) and video clips (YouTube.com).
Collaborative tagging systems have emerged to facilitate the procedures of tag- ging (annotating), sharing and exploring content over the established social networks. Fig. 4.1 illustrates the workflow of a common collaborative tagging system. In general, there are two phases during the process, namely the index- ing phase and the exploratory search phase. In the indexing phase, users tag content that they are interested in. The tagged content could be injected by the users themselves, for instance the photos in Flickr and videos in YouTube, or could come from other sources, for instance the web URLs in del.icio.us and the scientific papers in CiteULike. Aggregating from millions of users creates a large amount of user-generated content and their associated tags. In the ex- ploratory search phase, these systems allow users to search and explore relevant content using these associated tags.
The amount of user-generated content is increasing far more quickly than our capability to digest it. Compared to professionally produced content and meta data, collaborative tagging systems face the challenge that end-users assign tags in an uncontrolled manner, resulting in unsystematic and inconsistent meta- data. This calls for extensive support for suggesting tags in the indexing phase, and steering users towards their personal interests in the exploratory search phase. We identify three opportunities for personalization in collaborative tag- ging systems:
In the indexing phase:
1. Collaborative Indexing: personalizing the tagging process when a user assigns tags to label (index) certain content. Tags act as an indication of “aboutness” towards items. But, most users are not professional to describe content by tags precisely, and are insufficiently aware of tags in use by oth- ers. For instance, users might tag the same content using “computer game”, “computer-game” or “computer games”. Ideally, the system should suggest tags from the common vocabulary that fit the user’s intention or taste while remaining consistent with other users (shown in Fig. 4.1). As a result, users discover suitable tagging keywords more easily, and, more importantly, incon- sistent tagging behavior is reduced. This way, every user benefits from and builds upon the information contributed by others; it has even been claimed that this support for suggesting tags when a user is asked to label a certain
4.1. Introduction 59 item would lead to a true “folksonomy” (e.g. coherent categorization schemes) [32, 67]. We shall see shortly how our tag suggestion model (Section 126.96.36.199) reinforces tags that have been frequently used by the target user as well as other users.
In the exploratory search phase:
2. Collaborative Browsing: personalizing tag exploration when a user starts to browse for relevant content. Navigation through tags provides an effective way to explore and discover relevant content. To initiate the navigation, current collaborative tagging systems make use of “tag clouds”, a visual representation of the set of most popular tags [28, 70]. Popularity-based exploration is however limited, as it does not necessarily fit an individual user’s need: different users may have very different preferences. Personalizing tag exploration could allevi- ate the search cost and improve the retrieval performance. The formulation of the proposed collaborative browsing model is allocated in Section 188.8.131.52. 3. Collaborative Item Search: personalizing item search when a user chooses a tag (as a query). After identifying interesting tags, users click or issue these tags to further explore the related content (items). To rank items, most of the existing collaborative tagging systems rely on merely its association with the query tag, where usually a combination of the item’s popularity and “freshness” are employed as ranking criteria. However, it is well-known in text retrieval that, due to its ambiguity, a term (tag) alone is not semantically and contextually expressive enough to represent the needs for a particular user. For example, the term “apple” can be referring to a type of fruit, a computer brand or even a city. In this regard, an optimal relevance ranking should utilize as much extra information as possible to clarify the user’s needs. It is worthwhile investigating the usage of user preference in order to facilitate the personalized item search. The formulation of the personalized item search model can be found in Section 184.108.40.206.
Considering two generative processes in the tagging data, this paper proposes three types of task-focussed ranking models to solve the three aforementioned personalization problems in a unified framework. We show how the underlying personalized ranking scores for a given candidate (an item or a tag depend- ing on the task) consist of the popularity of the candidate and its likelihood towards the user preference. For probability estimation, we consider different types of generative processes in the tagging data, where the smoothing methods are naturally integrated. We then choose an optimal candidate model for each of the three problems introduced above, and estimate the probability of the user preferences being generated from that candidate model. Our experiments on two real data sets demonstrate the effectiveness of the methods, showing that all the three personalized models perform significantly better than the non-personalized ones, while the collaborative browsing model outperforms the ranking-based collaborative filtering approaches when we have more user pref-
Indexing Exploratory Searching User Profiles User Profiles Exploring Tag Clouds Searching Item Indexing/Tagging Items Tags New item A Chosen Tag A Chosen Item Suggested Tagging Keywords Given an Item Suggested Tagging Keywords Given an Item Suggested Relevant Tags Suggested Relevant Tags Suggested Relevant Items Given a Tag Suggested Relevant Items Given a Tag
Figure 4.1: Personalized Collaborative Tagging.
The remainder of the paper is organized as follows. We first summarize related work. We then introduce the generative processes of the tagging data and the derived ranking models for the three suggestion problems of indexing, browsing and search. We provide an empirical evaluation of the performance of our models for the three ranking tasks and the impact of parameters, and finally conclude our work.
Collaborative tagging systems have recently emerged as tools that assist to find structure in online database and user-generated content. As an example, Golder and Huberman conducted an investigation of del.icio.us, a web bookmarking system . Their results have been confirmed on measurements on the online photo album Flickr in . These works have also investigated the incentives for users to collaborate in a social tagging system, and although users mostly tag their items for their personal use, these tags can still be a great contribution to social exploratory search. Halpin et al studied the dynamics of the collabo- rative tagging system, showing that tagging distributions tend to stabilize into power law distributions . We think that providing a tagging suggestion that carefully examining both the personal behavior and other users’ behavior could accelerate the stabilization process, leading to a coherence categorization
4.3. Personalization Models 61