4.5 Collaborative Ensemble Learning with Support Vector Machines
4.6.3 Experiments with the Online Survey Data
Although we get impression that collaborative ensemble learning presents excellent performances, however, simulation can not replace the real-world cases. In this section, we will examine the performance of the three ap- proaches based on 190 user’s preference data on 642 painting images, which are gathered from the on-line survey. Again, we use top-N accuracy to eval- uate the performance. Since we can not require a user to rate all of the 642 painting images in the survey, for each user we just partially know the “ground truth” of preferences. As a result, the true precision cannot be com- puted. We thus adopt the accuracy measure that is the fraction of known
liked images in top ranked N images. The quantity is smaller than true accuracy because unknown liked images are missing in the measurement. However, in our survey, the presenting of images to users is completely ran- dom, thus the distributions of rated/unrated images in both unranked and ranked lists are also random. This randomness does not change the relative values of compared methods but just the absolute values. Thus in our follow-
(a)
(b)
Figure 4.5: Accuracy with various number of returned images. (a) for each
active user, we assume that 5 examples are given, (b) for each active user,
ing experiment it still makes sense to use the adopted accuracy measurement to compare the three retrieval methods.
Our experiment takes the leave-one-out scheme again, in which we pick up each user as the active user and treat all other users as collected advisory users. We fix the number of given examples for each active user to 5 and 20 respectively, and examine the retrieval accuracy in the cases of returning variousN top ranked images. We take the same methodology as Fig. 4.3 and demonstrate the results in Fig. 4.5-(a) (given 5 examples) and Fig. 4.5-(b) (given 20 examples). We find that collaborative ensemble learning achieves the best accuracy in both cases. Since in the data user ratings are much denser than the simulation case, collaborative filtering outperforms the SVM content-based method. Interestingly, the accuracy improvement of collabora- tive ensemble learning over the other two approaches are more impressive in the given-5 case. This is a very nice property for art image retrieval because users are normally not patient at the initial information-gathering stage and it is much desired to get satisfactory accuracy with only a few examples. Theoretically, this nice property can be explained from the Bayesian per- spective, where we use “an informative prior” learned from all the users to constraint the Bayesian inference. Such a prior knowledge gained from pop- ulation promises a good accuracy even when limited examples are fed to the learning system.
In the next, we take a closer look at a case study. As shown in Fig. 4.6, we let a user input a positive and a negative examples to run the collaborative ensemble learning algorithm. The returned top 20 results look quite diverse and meanwhile very different from the positive example. Surprisingly, the user loves 18 out of the 20 images and there is no strongly disliked image. As a comparison, we present the results of SVM content-based approach trained on the same examples in Fig. 4.7. We find that 8 results are actually from the same artist as the positive example is. The user told us that he strongly dislikes the images (1,4), (3,2) (3,5), (4,1), (4,3), (4,4) and (4,5).7 This case study is quite interesting, which demonstrates that, in the studied case where a user gives examples that only partially convey his preferences, collaborative ensemble learning effectively infer the user’s comprehensive interests while
Figure 4.6: Case study: Two images on the top are examples given by a
user. The lower 20 images are the top-20 results returned by collaborative
Figure 4.7: Top-20 results returned by SVM content-based retrieval. Ex-
amples are the same as the ones shown in Fig. 4.6.
SVM content-based approach only returns images that are similar to the positive example(s). In the art image retrieval application, presenting inter- esting but novel images to active users is a very nice property because a user can easily find images from the same artist (by category-based search) while has difficulties in locating potentially interesting images which are currently unknown to the user.
4.7
Conclusions
This chapter describes a theoretical framework—nonparametric hierarchical Bayesian approaches to hybrid information filtering. Traditionally, most of the information retrieval and filtering systems apply non-hierarchical content- based models. These methods ignore the connections between different users’ information needs. Then a session of information service can not inherit knowledge from other sessions. In our work, each user is modelled by a parametric content-based profile model, whose parameters θ are generated from a common prior distributionp(θ), which is shared by all the users. Then users are connected to each other statistically via the common prior.
common prior from data, we assume the common prior is a sample gen- erated from a hyper prior, i.e. “a prior distribution of the common prior distribution”.
Since the high complexity of the common prior can hardly be covered by any parametric distribution, we describe a nonparametric form for the common prior—an infinite multinomial distribution—which is a sample gen- erated from a Dirichlet process, i.e. the hyper prior.
We derive effective EM algorithms to learn the common prior from data annotated by users. In particular, various approximations are developed to solve analytically infeasible computations. The finally achieved predictive approaches are surprisingly simple and intuitively understandable.
• If a very strong hyper prior is assigned, then the learned common prior distribution can hardly be influenced by our empirical observations and remains the same as the base distribution. Therefore different users’ information needs can not be connected to each other via the common prior distribution. In this case the hierarchical modelling degenerates to conventional non-hierarchical modelling, which is actually the pure content-based filtering, assuming users are independent.
• If a very weak hyper prior is assigned, then the impact of base distri- bution vanishes and the learned common prior is completely adapted to empirical data. As a direct result, predictions for an active user are made by a committee of other users’ profile models (ML estimates). Users who are more like-minded to the active user will have more im- pacts in the committee. Here a principled hybrid filtering algorithm is derived since many content-based models are combined in a collabora- tive way. Interestingly, this method also leads to the pure collaborative filtering algorithm described in Ch. 2.
• If a normal hyper prior is assigned, the learned common prior is a trade- off between the base distribution and the empirical distribution. When existing profile models can not well explain the active user’s data, the model will automatically give high chances to other settings of mod- els. This is a very general framework for hybrid information filtering,
which explains a large family of existing hybrid filtering algorithms and suggest further improvements.
Finally we design the collaborative ensemble learning algorithm with SVMs, which is a realization of hybrid approach combining the basic idea of content-based filtering and collaborative filtering. The performance of col- laborative ensemble learning has been extensively tested. As compared to pure content-based and collaborative filtering, collaborative ensemble learn- ing achieved excellent performance for various data sets.
Our work not only presents a hybrid information filtering solution, but also unifies pure content-based filtering, pure collaborative filtering, and hy- brid filtering in a single theoretical framework. Most existing information filtering algorithms can also be explained in the framework, and principled improvements are suggested. To our best knowledge, we have not seen similar work in the literature.
Moreover, the nonparametric hierarchical model provides a general method- ology for modelling a population of related objects, like costumers in mar- keting analysis, hospitals in clinical analysis, patients in heath care, automa- chines for banks. We believe the work is a strong contribution to a wide range of data modelling tasks.
Chapter 5
Conclusions and Future Work
This thesis has been focusing on an important technology, information filter- ing, which aims to understand people’s information needs and find desired information items. The work extensively studies major branches of infor- mation filtering approaches, including collaborative filtering, content-based filtering, and hybrid filtering. Our emphasis is not only on exploring novel and effective algorithms for solving various real-world problems, but also building a theoretical framework that provides a unifying view for informa- tion filtering. Besides showing the technical soundness of our solutions, the unifying view make us deeply understand the the relations between people in a community and pave the way for further developments. Particularly, probability theory and Bayes theory have been intensively used throughout the whole thesis, as the natural language to build flexible models, encode the uncertainty of user profiles, model their intrinsic dependence, and integrate our prior knowledge into learning processes. In the following we conclude the major aspects of this thesis and point out some future directions as well.5.1
Probabilistic Memory-Based Collabora-
tive Filtering
Collaborative filtering has recently been widely applied in recommender sys- tems. It maintains a database of user ratings (or annotations) and explores
the similarity between users’ opinions for making predictions. Due to its simplicity, memory-based approach is the most popular collaborative filtering technique. Various heuristics for memory-based methods have been proposed in the literature. In contrast, our work focuses on a probabilistic version of memory-based collaborative filtering (PMCF). The model describes connec- tions between users’ interests probabilistically. More importantly, various principled extensions are then readily supported.
One such an extension is active learning. To make predictions for an active user, a recommender system must previously know something about this user’s interests. The easiest way is of course present some information items to the user for feedbacks. Conventional approaches did it in a passive way. Instead, our work does one earliest attempt to actively present unrated items to users. The choice of unrated items is made to maximize some
expected gain according to the current knowledge we have known about the user.
The other extension aims to reduce the computational cost of prediction making via working with a small subset of stored users. By choosing typical preference prototypes, predictions can be made accurately and efficiently. The selection procedure is derived directly from the PMCF framework and aims to preserve a minimum loss of the data density (measured by K-L divergence). The derived data selection algorithm focuses on the data that are novel to our current knowledge, but are in fact typical in the real world, which is quite intuitively understandable.
With the development of PMCF framework, we interestingly demonstrate a learning system that actively queries the objects that we want to know (e.g. asks questions to users), and samples the data that we have gathered (e.g. chooses the subset of data). In general, one wishes that learning systems are able to know what to learn, where to learn and how to learn. We make one step towards this direction, but there is still a long way to go.
In memory-based collaborative filtering, predictions for one active user are made by taking data from like-minded users, who themselves are often not completely known to us (as typically we only observe limited data from each user). Obviously, a better understanding of other users will boost our predictions for current active user, while a better understanding of current
user will again benefit for others. In future work we should consider this
propagation (or mutual enhancement) effects in collaborative filtering.