CHAPTER 3 : CASE-BASED REASONING
3.9 CASE-BASED USER PROFILING
3.9.4 Perspective on CBR in the Music Domain
CBR’s major advantage over rule-based or model-based systems is that the information contained in the case knowledge is not generalised to form rules, but instead lazily invoked at query time. However, if the development of the case and similarity knowledge is an intensive process, CBR has less of an advantage. In the music domain, our problem is fundamental: a lack of inexpensive vocabulary knowledge. Without this, case knowledge and similarity knowledge cannot be developed. The usual way of creating such vocabulary is for domain experts to establish the fundamental concepts, objects, relations, etc, which exist for a given domain, which is an expensive process.
However, it is implicitly acknowledged in the literature that there are varying degrees of strength of CBR. A strong CBR system makes use of a clear domain model to develop a case representation that may have surface features, which are used for indexing, and deeper, structural features which are used for similarity matching. This type of CBR is close to the cognitive model of CBR (Leake 1996). A weak CBR system, on the other hand, may calculate similarity between cases without reference to the semantics of the case representation. In this situation there are no structural features, or they are not easily available, and similarity is calculated based on the surface features available to the developer, using ‘a one size fits all’ metric such as the Euclidean or the City Block measure. Most applied CBR systems lie somewhere between the two, dependent on the difficulty of modelling the knowledge containers. In Chapter 5 we will describe our implementation of a weak CBR system to augment our ACF recommendation process.
3.10 Conclusion
In this chapter we reviewed case-based reasoning, an applied AI methodology which is suitable for problem solving in domains where it is difficult to elicit first principle rules. Unlike rule-based or model-based systems, CBR systems reason by drawing upon the knowledge stored in previously solved cases. While CBR generally has lower knowledge engineering overheads than systems that use first principle reasoning, it still relies upon the availability of four sources of knowledge: vocabulary knowledge, case knowledge, similarity knowledge and adaptation knowledge. CBR’s big advantage over rule-based systems is that the case knowledge is deployed at query time, and does not have to be generalised a priori. However, if the other sources of knowledge are not available, CBR systems are forced to confront the knowledge acquisition bottleneck. For instance, the PTV system, like the Smart Radio system, has problems acquiring vocabulary knowledge – the basic attributes and relations within a domain that allow useful cases and similarity metrics to be developed. The CASPER system faces a knowledge bottleneck in developing the similarity
knowledge required in its concept trees. We described how the Reuse phase (which uses adaptation knowledge) of the CBR cycle also poses serious problems for general CBR development since it forces developers to acquire some measure of first principles reasoning, which CBR systems are supposed to circumvent. With CBR increasingly being deployed in domains with large repositories of data, we discussed the scalability issues associated with lazy reasoning. While k-D-trees are efficient in terms of retrieval, they are problematic in domains in which only a partial problem description may be available. An example of this is interactive CBR, in which a user may only have a partial query and may need help in honing in on the solution or configuration he/she requires. In this situation, the reasoning system engages the user in a dialogue in order to guide him/her towards a solution. We reviewed CRNs, an efficient memory structure suited to retrieval in situations where queries are incomplete. We revisit Case Retrieval Nets later in this thesis where we apply them as a memory model for collaborative filtering techniques. CBR is increasingly being used on the Internet as a reasoning agent for product selection, configuration or personalisation. For example, CBR user profiles are used to represent items that users have liked/disliked in the past in order to make predictions on things they may like or dislike in the future. The PTV system builds a generalised case-profile from features of programmes that users have rated. Thus, the profile can be matched against new programmes appearing in the TV schedule. As mentioned earlier, one of the problems of the case-based approach in this domain is that good content description is hard to obtain. The CASPER system does not generalise the job description cases used to represent a user’s job interests. Instead these instances are used to classify retrieved job descriptions as relevant or irrelevant using the k-NN algorithm. The Dietorecs system does not use user profiling, as such, but employs a case base containing records of other users’ travel plan configurations, in order to interactively advise the user of possible itinerary scenarios.
While CBR is a good example of a ‘content-based’ personalisation strategy in which descriptions with clear semantics are used, we suggest that there are varying strengths of CBR. How case and similarity knowledge is implemented in a CBR system is dependent on the domain knowledge that is available. The next Chapter introduces Automated Collaborative Filtering, a technique which is often employed when domain knowledge is not readily available.