Implementation - Ruleset Production - Random Relational Rules

2.2 Ruleset Production

2.2.3 Implementation

This section discusses some implementation details which are essential for more efficient computation.

Algorithm 9 is a pseudo-code description of the Rrr-sd algorithm, prior to the optimisations presented below. As the basic stochastic discrimination algorithm requires arbitrary selection of a target class (which could exclude usefully discriminatory rules that are enriched for the class not selected) Rrr- sd, while similarly operating only on two-class problems, generates two sets of relational rules. Each of these rulesets will be enriched for one of the classes in the dataset, and contain at least a user-specified number of rules – due to the batch mechanic for uniformity, slightly more than the minimum number of rules may be produced.

Algorithm 9 Pseudocode for the Rrr-sd algorithm for Each class do

while Number of rules for current class is less than the minimum do while Number of rules in batch is less than the minimum do

Generate a rule

if Rule is enriched for current class then Add rule to rule batch

end if end while

Calculate the most uniformity-preserving non-zero subset of rules in the rule batch

Add those rules to the ruleset for the current class end while

end for

The following optimisations were implemented:

• Existence tests: Predicates that introduce variables that are never used in subsequent literals are ‘existence tests’ which either succeed or fail, without the need to enumerate all possible solutions. Consequently, such predicates will be treated like tests, and are cheap to evaluate.

Example:

2.2. RULESET PRODUCTION 41

atom(CompoundID, AtomID1, Element1, Quanta1, Charge1), Charge1 > 0.1,

atom(CompoundID, AtomID2, Element2, Quanta2, Charge2). In the example, the second atom literal introduces four new variables, none of which are used for tests, so the existence of an atom in the Compound is sufficient for this literal to succeed.

• Re-ordering and Separation: Literals in rules are re-ordered to minimise the branching factor encountered in evaluation of the rule. Because of the random rule generation process, re-ordering is both more useful and cheaper to compute than in systems like Foil, as it has to be done only once, after the final literal has been included, and it has the potential to significantly speed up coverage computations.

As predicates can introduce variables, they have the potential to increase branching, while tests have only two possible outcomes – success or fail- ure. As the impact of the branching caused by a predicate literal, or the decrease in branches caused by a test literal, is greater the earlier in the rule it appears, the test literals should optimally appear as early in the rule as possible. The earliest the tests can appear in the rule is immedi- ately after the predicate literal that introduces the variable or variables being tested, so tests are moved as close as possible to the predicates introducing their variables. Predicates are moved as far to the right as possible.

For example, this rule: rule(CompoundID):- atom(CompoundID, _, _, _, Charge1), atom(CompoundID, _, Element2, _, _), atom(CompoundID, _, _, Quanta3, _), Element2 = ‘h’, Quanta3 = ‘3’, Charge1 > 0.1. would become: rule(CompoundID):- atom(CompoundID, _, _, _, Charge1),

Charge1 > 0.1,

atom(CompoundID, _, Element2, _, _), Element2 = ‘h’,

atom(CompoundID, _, _, Quanta3, _), Quanta3 = ‘3’.

For clarity, the Prolog syntax that describes unused variables as under- scores is used here. The tests on Charge1 and Element2 have been moved to be adjacent to the predicates that introduced those variables.

To reduce the branching factor even further, some rules can be split into independent subrules, such that no subrule depends on variables introduced in another subrule. The rule is true for an instance (a particular binding of CompoundID, in the example below) if all of its subrules are true for that instance.

Example: Subrule 1: rule(CompoundID):- atom(CompoundID, _, _, _, Charge1), Charge1 > 0.1. Subrule 2: rule(CompoundID):- atom(CompoundID, _, Element2, _, _), Element2 = ‘h’ Subrule 3: rule(CompoundID):- atom(CompoundID, _, _, Quanta3, _), Quanta3 = ‘3’.

• Enrichment: Each rule belongs to exactly one of three disjoint sets - a rule is enriched for one class, a rule is enriched for the other class, or a rule is enriched for neither class (usually because it covers either no instances or all instances).

Rather than generating rules until the ruleset enriched for one class is complete, then repeating the process for the other class, Rrr-sd generates rules and adds each enriched rule to its appropriate ruleset. By interleaving the rule generation process in this way, no enriched rule will be wasted.

• Negation: If enough rules have already been generated for one class, additional enriched rules for that class are irrelevant. However, inverting

2.2. RULESET PRODUCTION 43

the coverage results for a rule enriched for one class will yield the coverage for a rule enriched for the other class in a binary class setting, and therefore such rules do not have to be discarded. This inversion is accomplished by treating any instances not covered by the rule as being covered, and any instances covered by the rule as being not covered, with the result that if the rule was enriched for one of the two classes, the negation must now be enriched for the other.

For most datasets it was found that the distribution of enriched random rules was slightly skewed with a larger number of enriched rules being generated for one class than the other. Thus, negation helped to reduce redundant rule creation and therefore also to reduce computing times. • Prefixes: Every prefix of a random rule is another random rule – some

may be enriched, some may not. As, in the course of evaluating the full rule, all the prefixes are also evaluated, there is very little computational cost in making use of this additional information. Rrr-sd selects the ‘most enriched’ prefix for each rule – the prefix for which the ratio be- tween the proportions covered of instances of each class by the rule is the greatest. This also has the advantage that, even if the full-length rule is not enriched, one of its prefixes may be, increasing the proportion of possible useful rules.

• ID Elements: When generating predicate literals, for appropriate datasets, the single variable that the predicates must already have bound can be required to be their ‘ID element’, with the remaining variables in the predicate being newly introduced. The ‘ID element’ is the argument in the predicate that identifies which instance it belongs to. For example, in the Mutagenesis predicate atom(CompoundID, AtomID, Element, Quanta, Charge), the CompoundID argument is the ‘ID element’. This reduces the amount of evaluation required, as when determining the mutagenicity of a particular compound, only atoms of that compound will be considered when determining the mutagenicity of the compound. For datasets where the instances are not interdependent, it does not make sense to generate rules that predict the class of one instance based on properties of another instance. (If the ‘ID element’ were not required to be bound, rules could be generated containing Atom and Bond predicates that could be instantiated to come from different compounds).

Algorithm 10 is a pseudo-code description of the optimised Rrr-sd algorithm.

Algorithm 10 Pseudocode for the optimised Rrr-sd algorithm while Number of rules for either class is less than the minimum do

while Number of rules in batch for either class is less than the minimum do

Generate a rule

Select an enriched prefix of that rule (including the full-length rule) if Rule is enriched for class A then

if Rule batch for class A is not yet full then Add rule to rule batch for class A

else

Negate rule and add it to rule batch for class B end if

else if Rule is enriched for class B then if Rule batch for class B is not yet full then

Add rule to rule batch for class B else

Negate rule and add it to rule batch for class A end if

end if end while

Calculate the most uniformity-preserving non-zero subset of rules in each rule batch

Add those rules to their corresponding rulesets end while

In document Random Relational Rules (Page 80-84)