INTRODUCTION
1.7 CASE ADAPTATION
Case adaptationis the process of transforming a solution retrieved into a solution appropriate for the current problem. It has been argued that adaptation may be the most important step of CBR since it adds intelligence to what would otherwise be simple pattern matchers.
A number of approaches can be taken to carry out case adaptation:
The solution returned (case retrieved) could be used as a solution to the current problem without modification, or with modifications where the solution is not entirely appropriate for the current situation.
The steps or processes that were followed to obtain the earlier solution could be rerun without modification, or with modifications where the steps taken in the previous solution are not fully satisfactory in the current situation.
Where more than one case has been retrieved, a solution could be derived from multiple cases or, alternatively, several alternative solutions could be presented.
Adaptation can use various techniques, including rules or further case-based rea- soning on the finer-grained aspects of the case. When choosing a strategy for case adaptation, it may be helpful to consider the following:
On average, how close will the case retrieved be to the present case?
Generally, how many characteristics will differ between the cases?
Are there commonsense or otherwise known rules that can be used when carrying out the adaptation?
Solution addresses current problem Case
adaptation Case base
Internal assessment of solution Retrieved case interaction Possible Solution doesn’t address current problem FAILURE Output solution
After the adaptation has been completed, it is desirable to check that the solution adapted takes into account the differences between the case retrieved and the current problem (i.e., whether the adaptation specifically addresses these differ- ences). At this point, there is also a need to consider what action is to be taken if this check determines that the solution proposed is unlikely to be successful. At this stage, the solution developed is ready for testing an use in the applicable domain. This stage also concludes our description of all the necessary steps for any CBR system; however, many systems will now enter a learning phase, as shown in Figure 1.7.
1.8 CASE LEARNING AND CASE-BASE MAINTENANCE
1.8.1 Learning in CBR Systems
Once an appropriate solution has been generated and output, there is some expecta- tion that the solution will be tested in reality (see Fig. 1.8). To test a solution, we have to consider both the way it may be tested and how the outcome of the test will be classified as a success or a failure. In other words, some criteria need to be defined for the performance rating of the proffered solution. Using this real-world assessment, a CBR system can be updated to take into account any new information uncovered in the processing of the new solution. This information can be added to a system for two purposes: first, the more information that is stored in a case base, the closer the match found in the case base is likely to be; second, adding information to the case base generally improves the solution that the system is able to create.
Learning may occur in a number of ways. The addition of a new problem, its solution, and the outcome to the case base is a common method. The addition of cases to the case base will increase the range of situations covered by the stored cases and reduce the average distance between an input vector and the closest stored vector (see Fig. 1.9).
Output solution Real-world testing Learning mechanism
Case base Case retriever
Case reasoner
Figure 1.8 Learning mechanism in CBR.
A second method of learning in a CBR system is using the solution’s assessment to modify the indexes of the stored cases or to modify the criteria for case retrieval. If a case has indexes that are not relevant to the specific contexts in which it should be retrieved, adjusting the indexes may increase the correlation between the occa- sions when a case is actually retrieved and the occasions when it ought to have been retrieved. Similarly, assessment of a solution’s performance may lead to an improved understanding of the underlying causal model of the domain that can be used to improve adaptation processing. If better ways can be found to modify cases with respect to the distance between the current and retrieved cases, the out- put solution will probably be improved.
When learning involves adding new cases to the case base, there are a number of considerations:
1. In which situations should a case be added to the case base, and in which situations should it be discarded? To determine an answer, we have to consider the level of success of the solution, how similar it is to other cases in the case base, and whether there are important lessons to be learned from the case.
2. If a case is to be added to the case base, the indexes of the new case must be determined and how that case is to be added to the case base. If the case base’s structure and retrieval method are both highly structured (e.g., an inductively determined hierarchical structure or a set of neural networks), the incorporation of a new case may require significant planning and restructuring of the case base.
1.8.2 Case-Base Maintenance
When applying CBR systems for problem solving, there is always a trade-off between the number of cases to be stored in the case library and retrieval efficiency. The larger the case library, the greater the problem space covered; however, this would also downgrade system performance if the number of cases were to grow unacceptably high. Therefore, removing redundant or less useful cases to attain
Sparse case base Denser case base
Query case Stored case
Distance between cases
an acceptable error level is one of the most important tasks in maintaining CBR systems. Leake and Wilson [22] defined case-base maintenance as the implementa- tion of policies for revising the organization or contents (representation, domain content, accounting information, or implementation) of a case base to facilitate future reasoning for a particular set of performance objectives.
The central idea of CBR maintenance is to develop some measures for case com- petence, which is the range of problems that a CBR system can solve. Various prop- erties may be useful, such as the size, distribution, and density of cases in the case base; the coverage of individual cases; and the similarity and adaptation knowledge of a given system [23].Coveragerefers to the set of problems that each case could solve, andreachabilityrefers to the set of cases that could provide solutions to the current problem [24]. The higher the density of cases, the greater the chances of having redundant cases. By expressing the density of cases as a function of case similarity, a suitable case deletion policy could be formulated for removing cases that are highly reachable from others.
Another reason for CBR maintenance is the possible existence of conflicting cases in the case library due to changes in domain knowledge or specific environ- ments for a given task. For example, more powerful cases may exist that can con- tain inconsistent information, either with other parts of the same case or with original cases that are more primitive. Furthermore, if two cases are considered equivalent (with identical feature values), or if one case subsumes another by hav- ing more feature criteria, a maintenance process may be required to remove the redundant cases.
1.9 EXAMPLE OF BUILDING A CASE-BASED
REASONING SYSTEM
So far, we have outlined the essential components and processes that make up a case-based reasoning system. For the convenience of readers, we provide here an example demonstrating how a CBR system can be built.
With the development of higher living standards and job opportunities in cities, more and more people move from various parts of a country to cities. When people reach a new place, they usually have to look for accommodations. In most cases, the main concern will be the renting of apartments. Some people may search through the World Wide Web (WWW) to look for rental information. However, case match- ing on the WWW is usually done in an exact manner. For example, if a person wants information on the rental of rooms or apartments in Lianhua Road, Pudong District, Shanghai City, it is very likely that no such information (no records) could be found, and therefore that search would fail. There are many reasons for such a failure; it may be because no apartments are available for rent, or the rental information is not available on the Web, or the search engine could not locate the information required. In this situation, CBR will be a suitable framework to provide answers to such queries because it is based on the degree of similarity
(not exactness) in matching with the capability of adaptation. Here we illustrate how to build a CBR system to handle this rental information searching problem.
Specifically, we explain how to perform case representation, case indexing, case similarity assessment, case adaptation, and case-base maintenance using a rental information searching problem. The example case base was taken from the Web site http://sh.soufun.com/asp/rnl/leasecenter on August 27, 2002, and consists of 251 cases. We randomly selected 20 cases (see Table 1.1) to constitute our sample. In the table, information is summarized. For example, ‘‘Mh’’ in the column ‘‘city district’’ represents Minghang; ‘‘11’’ in the column ‘‘type of apartment’’ represents one bedroom and one sitting room; and in the column ‘‘source of information,’’ ‘‘indi’’ represents information obtained from individual landlords, and those marked by ‘‘*’’ represent information obtained from real estate agents.
The following steps are carried out using these 20 cases:
Case representation and indexing. Choose an appropriate representation format for a case, and build a case index to facilitate the retrieval of similar cases in the future.
Case matching and retrieval. Establish an appropriate similarity function, and retrieve cases that are similar to the present case.
TABLE 1.1 Rental Information
Address Rental Cost of
of Type of Source of Apartment
Case City District Apartment Apartment Information (yuan/month)
1 Mh Lx 11 indi 500 2 Mh Lz 21 indi 2,500 3 Mh Ls1 21 indi 1,500 4 Mh Xz 11 indi 500 5 Mh Hm3 11 indi 880 6 Mh Ls2 31 dn* 1,500 7 Mh Dw 22 dn* 2,500 8 Xh Cd 11 indi 1,200 9 Xh Yd 32 indi 6,800 10 Xh Xm 20 indi 1,600 11 Cn Xh 11 indi 1,100 12 Cn Fy 11 indi 1,100 13 Cn Lj 21 dn* 3,500 14 Pd Wl 21 indi 1,600 15 Pd Wl 21 indi 1,600 16 Pd Yg 10 indi 3,500 17 Pd Ls 11 bc* 1,000 18 Yp Xy 10 indi 750 19 Yp Bx 32 indi 2,000 20 Yp Cy 42 ys* 10,000
Case adaptation. Construct a suitable adaptation algorithm to obtain an appropriate solution for the query case.
Case-base maintenance. Maintain consistency among cases (i.e., eliminate redundant cases) in the case base.
1.9.1 Case Representation
Representing cases is a challenging task in CBR. On the one hand, a case represen- tation must be expressive enough for users to describe a case accurately. On the other hand, CBR systems must reason with cases in a computationally tractable fashion. Cases can be represented in a variety of forms, such as prepositional repre- sentations, frame representations, formlike representations, and combinations of these three. In this example we could choose a flat feature-value vector for the representation of cases as shown in Table 1.2.
1.9.2 Case Indexing
Table 1.2 describes the case representation. Case indexingis the method used to store these cases in computer memory. In this example we organize the cases hier- archically so that only a small subset of them needs to be searched during retrieval. Processes are as follows:
Step 1. Select the most important feature attribute, city district, as an index (in general, people think that the city district is the most important feature).
Step 2. Classify cases into five classes,C1;C2;. . .;C5(Fig. 1.10) according to this feature.
When a new caseneis added to this index hierarchy, the following indexing pro- cedure could be used. Classify (ne, CL) (i.e., classifyneas a member of one of the classes of CL), where CL¼ fC1;C2;. . .;C5g. Input a new case described by a set
of featuresne¼ fF1;F2;. . .;F5g.
TABLE 1.2 Case Representation
Case_id: 1 Problem features
City district Mh
Address of apartment Lx
Type of apartment 11
Source of information indi
Solution
Rental cost of apartment 500
Step 1. Starting fromi¼1, select one caseej randomly from classCi.
Step 2. Compare the value of the first feature (i.e., city district) ofneandej.
Step 3. If the two values are identical,ne2Ci, else i¼iþ1; go to step 1.
Step 4. Repeat steps 1, 2, and 3, untilne belongs to some classCi.
1.9.3 Case Retrieval
Case retrievalis the process of finding within the case base the case(s) that are clo- sest to the current case. For example, for a given case^ee¼ ðMh;Lz;11;indi;yÞ, where we want to know the rent informationy, case retrieval is the process of find- ing those cases that are the closest to^ee.
The retrieval algorithm is begun by deciding to which class the present case^ee
belongs: Classify (^ee, CL), where CL¼ fC1;C2;. . .;C5g and^eeis a query case. Step 1. Starting fromi¼1, select one caseej randomly from classCi.
Step 2. Compare the feature value ‘‘district’’ of^eeandej.
Step 3. If the two values are identical,^ee2Ci, elsei¼iþ1; go to step 1.
Step 4. Repeat steps 1, 2, and 3 until^eebelongs to some classCi. Next, search for similar cases.
Step 1. If^ee2Ci, for each caseeicalculate the degree of similarity between^eeandei as
SMð^ee;eiÞ ¼
common commonþdifferent
where ‘‘common’’ represents the number of features whose value is the same between^eeandei, and ‘‘different’’ represents the number of features whose value is different between^eeandei.
Step 2. Rank the cases by similarity measure computed in step 1.
C1 C2 C5
CB
Step 3. Choose cases such ase1;e2;. . .;ekfromCiwhich are most similar to^ee; that is,
SMð^ee;eiÞ ¼maxfSMð^ee;ejÞ;ej2Cig i¼1;2;. . .;k
For the given query case ^ee¼ ðMh;Lz;11;indi;y), after following the algo- rithm above, four similar cases are retrieved from C1: e1;e2;e4, ande5, where
e1¼ ðMh;Lx;11;indi;500Þ e2¼ ðMh;Lz;21;indi;2500Þ
e4¼ ðMh;Xz;11;indi;500Þ e5¼ ðMh;Hm3;11;indi;880Þ and
SMð^ee;e1Þ ¼SMð^ee;e2Þ ¼SMð^ee;e4Þ ¼SMðe^e;e5Þ ¼0:75
1.9.4 Case Adaptation
After a matching case is retrieved, a CBR system adapts the solution stored in the retrieved case to fulfill the needs of the current case. The following is an algorithm used to carry out the task of case adaptation.
Step 1. Group the retrieved cases (which are most similar to the present case^ee) based on their solution value (i.e., rental cost of the apartment) into several categories, C1;C2. . .;Cm, with corresponding values of rental cost of the apartment,a1;a2;. . .;am.
Step 2. Count the number of cases retrieved in each categoryCiði¼1;2;. . .;mÞ, and record them asn1;n2;. . .;nm.
Step 3. Choose categoryCi for whichni¼maxfn1;n2. . .;nmg.
Step 4. If there exists only one category Ci such that ni>nj; j6¼i, take the solution (rental cost of the apartment) stored in a case inCias the solution to the present case; else, if there exist several categories, such as Ci1;Ci2;. . .;Cik satisfying ni1¼ni2¼ ¼nik¼maxfn1;n2;. . .;nmg, take the average value bb defined by
b
b¼ai1þai2þ þaik k
as the solution to the present case, whereai1;ai2;. . .;aik are values of the Rental cost of the apartment corresponding toCi1;Ci2;. . .;Cik.
In Section 1.9.3 we retrieved four cases,e1;e2;e4;e5, which are most similar to the present case^ee, where^ee¼ ðMh;Lz;11;indi;yÞ. According to the case adapta- tion algorithm above, we find a suggested (adapted) solution for this case. First, we
group these four cases into three categories, C1¼ fe1;e4g;C2¼ fe2g, and
C3¼ fe5g, where the corresponding solutions (the rental cost of the apartment stored in a case) are 500, 2500, and 880. We then count the number of cases in each category. For example, the number of cases in categoryC1 is 2, the number of cases in categoryC2 is 1, and the number of cases in categoryC3is 1. Since the number of cases in categoryC1is the greatest, we choose 500 as the suggested solu- tion to the present case.
Note that once we have obtained an adapted solution, its validity should be tested in reality. In the example here, the landlord might be asked if the solution (i.e., the suggested price) is acceptable. If it is acceptable, the performance of this CBR system can then be improved by adding this new case record.
1.9.5 Case-Base Maintenance
The maintenance problem is very important, as it safeguards the stability and accu- racy of CBR systems in real-world applications. However, our example here focuses only on the problem of case redundancy. From Table 1.1 we see that there exist some redundant cases, such as cases 14 and 15, that will downgrade system performance, so we should identify and remove these cases. A way of doing this is as follows:
Step 1. For each class Ci, as in Section 1.9.1, where we have the case base