This section discusses the generation of random relational rules by the Random Relational Rules (Rrr) algorithm. Section 2.1.1 gives the algorithm used by Rrr for rule production and Section 2.1.2 discusses the complexity of rule evaluation, first with respect to a single literal and then for entire rules.
2.1.1
Rule Generation Algorithm
The Rrr algorithm operates on two-class problems, and produces one set of first-order rules for each class. Unlike Bagging, it does not resample the training set. Rules are generated fully randomly by adding literals to a partial clause in a manner similar to the Foil algorithm (pseudocode for which is shown in Algorithm 7).
Algorithm 7 Pseudocode for rule generation in the Foil algorithm Theory: empty
Remaining: all positive instances while Remaining is not empty do
Rule: empty
while Rule covers negative instances do
for each literal that could be added to the rule do Compute the information gain of the literal end for
Add the best literal(s) to Rule end while
Remove positive instances covered by Rule from Remaining Add Rule to Theory
end while
The algorithm for Rrr is given in Algorithm 8. Where a random choice is made in the algorithm, each possible choice has equal probability. Predicate literals must have exactly one variable already bound. Test literals cannot introduce variables.
The stopping condition of Foil – purity of rule coverage – differs from the rule length limitation of Rrr, and Rrr does not remove the instances covered by each generated rule from the training set (as the rules generated by Rrr are much less likely to have class-pure coverage than those produced by Foil), but literal-by-literal rule generation is common to both. Foil also limits the length of its generated rules indirectly, by rejecting literals that would cause the bits required to encode the rule to exceed those required to indicate the
2.1. RULE GENERATION 33
Algorithm 8 Pseudocode for rule generation in the Rrr algorithm while Number of literals in rule is less than maximum rule length do
Randomly select whether to generate a predicate or test literal if Generating a predicate literal then
Randomly select which predicate literal to add
Add predicate literal, ensuring that exactly one variable is bound, and introducing new variables for each other argument
else if Generating a test literal then Randomly select which variable to test
Randomly select whether to test against a variable or constant if Testing against a variable then
Randomly select a variable to test against else if Testing against a constant then
Randomly select a constant to test against end if
Randomly select an operator (from {=, 6=} or {=, 6=, <, ≤, >, ≥} as appropriate)
Add test literal end if
end while
instances covered by the rule. Rrr computes the coverage of literals once the rule is complete, while Foil computes the coverage of possible literals before selecting which one to add.
2.1.2
Complexity
Because the structure of the rule production method of Rrr has such a simi- larity to that of Foil, the two are compared here.
The complexity of constructing and evaluating a rule in Rrr is dominated by the cost of the evaluation, which is exponential with respect to the number of literals in the rule, and influenced by the ‘branching factor’ – for a predicate literal, the number of new bindings in the predicate compatible with the current ones and for a test, the proportion of the current bindings that satisfy the test. This is O(cn), where the upper bound for c is the maximum branching factor
for any single literal and n is the number of literals. Evaluating a single literal
The cost to evaluate a single literal is the same for Foil as it is for Rrr – both are dependent on the branching factor. Pazzani and Kibler calculate bounds and estimates for Foil’s search [58], and their terminology will be used here.
For the branching factor, which applies to both Rrr and Foil, for a literal where no new variables are introduced, let the Density of that literal be the proportion of cases where that literal is true. For a literal that introduces new variables, let the Power of the predicate be the maximum number of solutions for that predicate with exactly one variable bound. Given those definitions and that Li for i = 1 to k gives the literals in a rule of length k, Growth(Li) is
1 if Li does not introduce new variables and Power(Li) if it does (see Equation
2.1).
Growth(Li) =
(
1 Li introduces no new variables
Power(Li) otherwise
(2.1)
This leads to an upper bound for the branching factor of:
BranchingFactor ≤
k
Y
i=1
Growth(Li) (2.2)
For an estimate of the branching factor, the AveragePower of a predicate can be defined as the average number of solutions for that predicate when exactly one variable is bound, and the AverageGrowth of a literal as its Density if no new variables are introduced or its AveragePower if new variables are introduced.
AverageGrowth(Li) =
(
Density(Li) Li introduces no new variables
AveragePower(Li) otherwise
(2.3) This allows the branching factor to be approximated with:
BranchingFactor ≈
k
Y
i=1
AverageGrowth(Li) (2.4)
The branching factors given in Equations 2.2 and 2.4 show that both Rrr and Foil have branching factors that grow exponentially with the number of possible solutions to the predicates. Later literals in a rule thus often have a higher branching factor than earlier ones and have a correspondingly greater cost to evaluate.
2.1. RULE GENERATION 35 Table 2.1: Literals considered in rule construction
Literal added Predicates searched Tests searched Total searched
atom(A,B,C,D,E) 2 0 2 atom(A,F,C,G,H) 38 106 144 E > -0.121 124 216 340 H <= 0.011 124 216 340 H > -0.084 124 216 340 E <= -0.112 124 216 340 Total 536 970 1, 506
The number of literals evaluated
Foil faces a higher cost than Rrr in rule construction, where, when deter- mining a literal to add, Foil evaluates all possible literals and Rrr randomly selects one. The number of literals Foil investigates grows exponentially with the arity of the predicates and the number of variables currently in the rule. Thus, as the size of the rule increases, the number of literals Foil evaluates increases – and as the branching factor usually also increases with the number of literals, the cost to evaluate those literals also increases.
For example, the Mutagenesis dataset contains three predicates – • compound(CompoundID)
• atom(CompoundID, AtomID, Element, Quanta, Charge) • bond(CompoundID, AtomID, AtomID, BondType).
Table 2.1 gives the number of literals considered for a rule generated by Foil during one of the experiments:
active(A):- atom(A, B, C, D, E), atom(A, F, C, G, H), E > -0.121, H <= 0.011, H > -0.084, E <= -0.112.
Only predicate literals change the number of variables in the rule and thus affect the search space.
In constructing the same rule, Rrr would evaluate two predicate literals and five test literals, for a total of seven (Rrr would use an additional test to
encode the equality of the element field for the two atoms). Because the cost of evaluating literals changes as the rule grows, it cannot be said that Rrr could produce 1506 / 8 ≈ 188 rules with roughly the same cost as Foil produces one, but it can be observed that Rrr can randomly generate a substantial number of rules without exceeding Foil’s computational cost. The advantage of Rrr shows more clearly on high-arity predicates such as conformation/168 in the Musk1 dataset. After one conformation predicate is added to the rule, the
number of conformation predicates to be examined escalates. For each of the 168 arguments in a conformation literal, Foil must examine a conformation predicate using an existing variable for that argument or introducing a new one. This gives 2167 or roughly 1.87 × 1050 predicate literals for Foil to evaluate.
Rrr has to evaluate only the predicate it randomly selects and its execution time is thus unaffected by this explosive increase in the number of possible literals.