3.5 Case-based inference
3.5.3 Maximum likelihood estimation
The probabilistic model outlined above is specified by two parameters: the ideal solution y∗ and the (true) precision parameter β∗ ∈ R+. Depending on the context in which
these parameters are sought, the ideal solution might be unrestricted (i.e., any element of Y is an eligible candidate), or it might be restricted to a certain subset Y0 ⊆ Y of
candidates.
Now, to estimate the parameter vector θ∗ = (y∗, β∗) ∈ Y0× R∗ from a given set
D = {y(i) z(i)}N
i=1of observed preferences, we refer to the maximum likelihood (ML)
estimation principle. Assuming independence of the preferences, the log-likelihood of θ = (y, β) is given by `(θ) = `(y, β) = − N X i=1 log
1 + exp − β(∆(z(i), y) − ∆(y(i), y))
. (3.5)
The maximum likelihood estimation (MLE) θM L = (yM L, βM L) of θ∗ is given by the
maximizer of (3.5): θM L= yM L, βM L = arg max y∈Y0, β∈R+ `(y, β)
It is important to note that in our search procedure, we form a neighborhood around an initial solution y, this neighborhood can be in whichever form we choose depending on the structure of the solution space. If we have a continuous solution space, we can
3.6 Conclusion
form for example a circular neighborhood or form a gaussian distribution neighborhood around the initial solution. If the solution space is discrete then we form a neighborhood around the initial solution which consists of candidates differing by one discrete step from the initial solution (for an item set it would be item sets which have one item added or one item removed, for a permutation it would be switching orders of the permutation, etc.). Using (3.5) and our set of observed preferencesD = {y(i) z(i)}N
i=1, we compute
the likelihoods of the neighborhood candidates in the subset Y0 ⊆ Y and thus find the
solution y∗with the maximum likelihood. The maximum likelihood solution is then our candidate solution and we form a suitable neighborhood around it and start to ask the oracle again. At the end of the cycle of querying the oracle, the last obtained preferred solution is considered our best obtained solution for the problem at hand. This solution is then stored in the case base as our optimal solution for the problem. The preferences generated during the cycle of querying the oracle during the problem-solving episode are also stored in the case base along with the problem for later reuse when a new problem is solved. For simplicity we fix β and therefore we can then easily determine our y∗.
3.6
Conclusion
In this chapter, we have presented a general framework for CBR in which experience is represented in the form of contextualized preferences, and these preferences are used to direct an adaptive problem solving process that is formalized as a search procedure. This kind of preference-based CBR is an interesting alternative to conventional CBR whenever solution quality is a matter of degree and feedback is only provided in an indirect or qualitative way. The effectiveness of our generic framework is illustrated in several concrete case studies, presented in Chapter 7.
The Pref-CBR framework will be generalized and extended in two directions in the next two chapters. First, as already mentioned, the similarity (distance) measure in the solution space has an important influence on the preference relations x associated
with problems x ∈ X and essentially determines the structure of these relations (cf. Section 3.3). Therefore, we show that a proper specification of this measure enhances the effectiveness of our preference-guided search procedure. Accordingly, it would be desirable to allow for a data-driven adaptation of this measure, that is, to enable
3. PREFERENCE-BASED CASE-BASED REASONING
the CBR system to adapt this measure whenever it does not seem to be optimal. We propose a method for learning similarity measures in the solution space from qualitative feedback, which appears to be ideally suited for this purpose. For further optimizing the performance of our Pref-CBR framework, we also propose another method for learning similarity measures in the problem space by learning from examples. The aforementioned methods are described in detail in the next chapter.
As the number of preferences collected over the course of time may become rather large, we also develop effective methods for case base maintenance, which specifically suit the Pref-CBR framework. We propose some case base maintenance strategies which allow us to increase the efficiency of the case base while maintaining its performance. We propose two directions of case base maintenance strategies, inter-case maintenance and intra-case maintenance. The former maintenance methods handle whole cases, while the latter methods handle preferences (parts of cases). In Chapter 5, we will describe some case base maintenance strategies which specifically suit our framework and enhance its efficiency.
After describing in detail how our Pref-CBR framework operates, we will learn how the integrated components of learning similarity measures as well as the case base maintenance strategies are embedded in the framework. We will also show how they affect the performance and efficiency of the whole system. We will also look at other methods and see how they are similar or different from our framework. In Chapter 6, we discuss some methodologies which are related to our Pref-CBR framework. These methodologies include different search methods and some machine learning approaches. We compare the different approaches with our Pref-CBR framework, discuss the sim- ilarities and the differences and convey the position in which our approach is situated amongst the other approaches which are related to ours.
4
Learning Similarity Measures in
Pref-CBR
In our Pref-CBR framework, case-based problem solving is formalized as a preference- guided search process in the space of candidate solutions, which is equipped with a similarity (or, equivalently, a distance) measure. A well-defined similarity measure is crucial for the optimal performance of a case-based reasoning system. In preference- based CBR, the preferences induced during the search procedure can be used to learn the similarity measures and thus lead to improved search performance.
Like many other CBR approaches, Pref-CBR proceeds from a formal framework consisting of a problem space X and a solution space Y. Yet, somewhat less common, it assumes a similarity (or distance) measure to be defined not only on X but also on Y. Moreover, it assumes a strong connection between the notions of preference and similarity. More specifically, for each problem x ∈ X, it assumes the existence of a theoretically ideal solution y∗ ∈ Y (even if this solution might be fictitious and cannot be materialized), and the less another solution y differs from y∗ in the sense of a distance measure ∆Y, the more this solution is preferred.
As a consequence, the performance and effectiveness of Pref-CBR is strongly in- fluenced by the distance measure ∆Y: The better this measure captures the true dif-
ferences between solutions, the more effective Pref-CBR will be. In this chapter, we therefore extend our framework through the integration of a distance learning module. Thus, the idea is to make use of the experience collected in a problem solving episode, not only to extend the case base through memorization of preferences, but also to adapt
4. LEARNING SIMILARITY MEASURES IN PREF-CBR
the distance measure ∆Y. Since the efficacy of Pref-CBR is influenced by the adequacy
of this measure, we propose a learning method for adapting solution similarity on the basis of experience gathered by the CBR system over the course of time. More specifi- cally, our solution similarity learning method makes use of an underlying probabilistic model and realizes adaptation as Bayesian inference.
The importance of distance metric learning in optimizing the performance of many learning and data mining algorithms has been mentioned in the work of [68]. An important challenge we had to consider in formalizing our learning method, is the incorporation of prior information of paired comparisons [69] through our search pro- cedure. Existing approaches for learning distance metrics from pairwise comparisons suffer from either being unreliable when the number of training examples is small [68], or these methods often use ad-hoc algorithms with little or no formal basis [69]. Due to these factors, the Bayesian approach is our method for learning similarity measures in the solution space. Our aim is to implement a formal and accurate learning method, which actively uses prior as well as current preference information to yield a posterior distribution that increases the probability of choosing an optimal solution for a current problem at hand.
To further optimize the performance of our Pref-CBR framework, we pursue learn- ing similarity measures also in the problem space. We use the well-known perceptron algorithm to combine given local similarity measures and learn how to combine them into a global measure. This method elicits global similarity measures on the basis of feedback in the form of positive and negative examples to be used for learning. We learn the similarity measures from qualitative feedback: given a reference case and two cases to compare with, we see which of these two cases is more similar to the reference case. The general idea of this approach basically reduces the problem of distance learning to a binary classification problem.
4.1
Learning similarity measures in the solution space
The learning and adaptation of similarity or distance measures has been studied inten- sively in the literature, not only in CBR but also in related fields like machine learning. Yet, our approach has a number of properties that distinguish it from most others: similarity is learned in the solution space, not only in the problem space; training
4.1 Learning similarity measures in the solution space
information is purely qualitative and based on paired comparisons. Learning is done within the framework of Bayesian inference, making use of a probabilistic model. In this section, we will explain in further detail our proposed method for learning similarity measures in the solution space.