Rule Generation - Random Relational Rules

This section discusses the generation of random relational rules by the Random Relational Rules (Rrr) algorithm. Section 2.1.1 gives the algorithm used by Rrr for rule production and Section 2.1.2 discusses the complexity of rule evaluation, first with respect to a single literal and then for entire rules.

2.1.1 Rule Generation Algorithm

The Rrr algorithm operates on two-class problems, and produces one set of first-order rules for each class. Unlike Bagging, it does not resample the training set. Rules are generated fully randomly by adding literals to a partial clause in a manner similar to the Foil algorithm (pseudocode for which is shown in Algorithm 7).

Algorithm 7 Pseudocode for rule generation in the Foil algorithm Theory: empty

Remaining: all positive instances while Remaining is not empty do

Rule: empty

while Rule covers negative instances do

for each literal that could be added to the rule do Compute the information gain of the literal end for

Add the best literal(s) to Rule end while

Remove positive instances covered by Rule from Remaining Add Rule to Theory

end while

The algorithm for Rrr is given in Algorithm 8. Where a random choice is made in the algorithm, each possible choice has equal probability. Predicate literals must have exactly one variable already bound. Test literals cannot introduce variables.

The stopping condition of Foil – purity of rule coverage – differs from the rule length limitation of Rrr, and Rrr does not remove the instances covered by each generated rule from the training set (as the rules generated by Rrr are much less likely to have class-pure coverage than those produced by Foil), but literal-by-literal rule generation is common to both. Foil also limits the length of its generated rules indirectly, by rejecting literals that would cause the bits required to encode the rule to exceed those required to indicate the

2.1. RULE GENERATION 33

Algorithm 8 Pseudocode for rule generation in the Rrr algorithm while Number of literals in rule is less than maximum rule length do

Randomly select whether to generate a predicate or test literal if Generating a predicate literal then

Randomly select which predicate literal to add

Add predicate literal, ensuring that exactly one variable is bound, and introducing new variables for each other argument

else if Generating a test literal then Randomly select which variable to test

Randomly select whether to test against a variable or constant if Testing against a variable then

Randomly select a variable to test against else if Testing against a constant then

Randomly select a constant to test against end if

Randomly select an operator (from {=, 6=} or {=, 6=, <, ≤, >, ≥} as appropriate)

Add test literal end if

end while

instances covered by the rule. Rrr computes the coverage of literals once the rule is complete, while Foil computes the coverage of possible literals before selecting which one to add.

2.1.2 Complexity

Because the structure of the rule production method of Rrr has such a simi- larity to that of Foil, the two are compared here.

The complexity of constructing and evaluating a rule in Rrr is dominated by the cost of the evaluation, which is exponential with respect to the number of literals in the rule, and influenced by the ‘branching factor’ – for a predicate literal, the number of new bindings in the predicate compatible with the current ones and for a test, the proportion of the current bindings that satisfy the test. This is O(cn_{), where the upper bound for c is the maximum branching factor}

for any single literal and n is the number of literals. Evaluating a single literal

The cost to evaluate a single literal is the same for Foil as it is for Rrr – both are dependent on the branching factor. Pazzani and Kibler calculate bounds and estimates for Foil’s search [58], and their terminology will be used here.

For the branching factor, which applies to both Rrr and Foil, for a literal where no new variables are introduced, let the Density of that literal be the proportion of cases where that literal is true. For a literal that introduces new variables, let the Power of the predicate be the maximum number of solutions for that predicate with exactly one variable bound. Given those definitions and that Li for i = 1 to k gives the literals in a rule of length k, Growth(Li) is

1 if Li does not introduce new variables and Power(Li) if it does (see Equation

2.1).

Growth(Li) =

(

1 Li introduces no new variables

Power(Li) otherwise

(2.1)

This leads to an upper bound for the branching factor of:

BranchingFactor ≤

i=1

Growth(Li) (2.2)

For an estimate of the branching factor, the AveragePower of a predicate can be defined as the average number of solutions for that predicate when exactly one variable is bound, and the AverageGrowth of a literal as its Density if no new variables are introduced or its AveragePower if new variables are introduced.

AverageGrowth(Li) =

(

Density(Li) Li introduces no new variables

AveragePower(Li) otherwise

(2.3) This allows the branching factor to be approximated with:

BranchingFactor ≈

i=1

AverageGrowth(Li) (2.4)

The branching factors given in Equations 2.2 and 2.4 show that both Rrr and Foil have branching factors that grow exponentially with the number of possible solutions to the predicates. Later literals in a rule thus often have a higher branching factor than earlier ones and have a correspondingly greater cost to evaluate.

2.1. RULE GENERATION 35 Table 2.1: Literals considered in rule construction

Literal added Predicates searched Tests searched Total searched

atom(A,B,C,D,E) 2 0 2 atom(A,F,C,G,H) 38 106 144 E > -0.121 124 216 340 H <= 0.011 124 216 340 H > -0.084 124 216 340 E <= -0.112 124 216 340 Total 536 970 1, 506

The number of literals evaluated

Foil faces a higher cost than Rrr in rule construction, where, when deter- mining a literal to add, Foil evaluates all possible literals and Rrr randomly selects one. The number of literals Foil investigates grows exponentially with the arity of the predicates and the number of variables currently in the rule. Thus, as the size of the rule increases, the number of literals Foil evaluates increases – and as the branching factor usually also increases with the number of literals, the cost to evaluate those literals also increases.

For example, the Mutagenesis dataset contains three predicates – • compound(CompoundID)

• atom(CompoundID, AtomID, Element, Quanta, Charge) • bond(CompoundID, AtomID, AtomID, BondType).

Table 2.1 gives the number of literals considered for a rule generated by Foil during one of the experiments:

active(A):- atom(A, B, C, D, E), atom(A, F, C, G, H), E > -0.121, H <= 0.011, H > -0.084, E <= -0.112.

Only predicate literals change the number of variables in the rule and thus affect the search space.

In constructing the same rule, Rrr would evaluate two predicate literals and five test literals, for a total of seven (Rrr would use an additional test to

encode the equality of the element field for the two atoms). Because the cost of evaluating literals changes as the rule grows, it cannot be said that Rrr could produce 1506 / 8 ≈ 188 rules with roughly the same cost as Foil produces one, but it can be observed that Rrr can randomly generate a substantial number of rules without exceeding Foil’s computational cost. The advantage of Rrr shows more clearly on high-arity predicates such as conformation/168 in the Musk1 dataset. After one conformation predicate is added to the rule, the

number of conformation predicates to be examined escalates. For each of the 168 arguments in a conformation literal, Foil must examine a conformation predicate using an existing variable for that argument or introducing a new one. This gives 2167 _{or roughly 1.87 × 10}50 _{predicate literals for Foil to evaluate.}

Rrr has to evaluate only the predicate it randomly selects and its execution time is thus unaffected by this explosive increase in the number of possible literals.

In document Random Relational Rules (Page 72-76)