Knowledge Representation
3.6 Rules involving relations
We have assumed implicitly that the conditions in rules involve testing an attribute value against a constant. Such rules are called propositionalbecause the attribute-value language used to define them has the same power as what logi- cians call the propositional calculus. In many classification tasks, propositional rules are sufficiently expressive for concise, accurate concept descriptions. The weather, contact lens recommendation, iris type, and acceptability of labor con- tract datasets mentioned previously, for example, are well described by propo- sitional rules. However, there are situations in which a more expressive form of rule would provide a more intuitive and concise concept description, and these are situations that involve relationships between examples such as those encoun- tered in Section 2.2.
Suppose, to take a concrete example, we have the set of eight building blocks of the various shapes and sizes illustrated in Figure 3.6, and we wish to learn the concept ofstanding.This is a classic two-class problem with classes stand- ingand lying. The four shaded blocks are positive (standing)examples of the concept, and the unshaded blocks are negative (lying)examples. The only infor-
Shaded: Unshaded:
standing lying
mation the learning algorithm will be given is the width, height,and number of sidesof each block. The training data is shown in Table 3.2.
A propositional rule set that might be produced for this data is:
if width ≥ 3.5 and height < 7.0 then lying
if height ≥ 3.5 then standing
In case you’re wondering, 3.5 is chosen as the breakpoint for widthbecause it is halfway between the width of the thinnest lying block, namely 4, and the width of the fattest standing block whose height is less than 7, namely 3. Also, 7.0 is chosen as the breakpoint for heightbecause it is halfway between the height of the tallest lying block, namely 6, and the shortest standing block whose width is greater than 3.5, namely 8. It is common to place numeric thresholds halfway between the values that delimit the boundaries of a concept.
Although these two rules work well on the examples given, they are not very good. Many new blocks would not be classified by either rule (e.g., one with width 1 and height 2), and it is easy to devise many legitimate blocks that the rules would not fit.
A person classifying the eight blocks would probably notice that “standing blocks are those that are taller than they are wide.” This rule does not compare attribute values with constants, it compares attributes with each other:
if width > height then lying
if height > width then standing
The actual values of the heightand widthattributes are not important; just the result of comparing the two. Rules of this form are called relational, because they express relationships between attributes, rather than propositional,which denotes a fact about just one attribute.
Table 3.2 Training data for the shapes problem.
Width Height Sides Class
2 4 4 standing 3 6 4 standing 4 3 4 lying 7 8 3 standing 7 6 3 lying 2 9 4 standing 9 1 4 lying 10 2 3 lying
Standard relations include equality (and inequality) for nominal attributes and less than and greater than for numeric ones. Although relational nodes could be put into decision trees just as relational conditions can be put into rules, schemes that accommodate relations generally use the rule rather than the tree representation. However, most machine learning methods do not consider relational rules because there is a considerable cost in doing so. One way of allowing a propositional method to make use of relations is to add extra, sec- ondary attributes that say whether two primary attributes are equal or not, or give the difference between them if they are numeric. For example, we might add a binary attribute is width <height?to Table 3.2. Such attributes are often added as part of the data engineering process.
With a seemingly rather small further enhancement, the expressive power of the relational knowledge representation can be extended very greatly. The trick is to express rules in a way that makes the role of the instance explicit:
if width(block) > height(block) then lying(block)
if height(block) > width(block) then standing(block)
Although this does not seem like much of an extension, it is if instances can be decomposed into parts. For example, if a toweris a pile of blocks, one on top of the other, then the fact that the topmost block of the tower is standing can be expressed by:
if height(tower.top) > width(tower.top) then standing(tower.top)
Here,tower.topis used to refer to the topmost block. So far, nothing has been gained. But iftower.restrefers to the rest of the tower, then the fact that the tower is composed entirelyof standing blocks can be expressed by the rules:
if height(tower.top) > width(tower.top) and standing(tower.rest)
then standing(tower)
The apparently minor addition of the condition standing(tower.rest)is a recur- sive expression that will turn out to be true only if the rest of the tower is com- posed of standing blocks. A recursive application of the same rule will test this. Of course, it is necessary to ensure that the recursion “bottoms out” properly by adding a further rule, such as:
if tower = empty then standing(tower.top)
With this addition, relational rules can express concepts that cannot possibly be expressed propositionally, because the recursion can take place over arbitrarily long lists of objects. Sets of rules such as this are called logic programs,and this area of machine learning is called inductive logic programming.We will not be treating it further in this book.