• No results found

Genetic Algorithm Rule Induction

Genetic Algorithms for Trade Filtering

4.4 Genetic Algorithm Rule Induction

It is extremely unlikely that it is possible to build a complete model o f market behaviour from the 1375 training examples that are available. Instead o f this, the system can search for pockets o f predictability. Instead o f attempting to build a complete model, the system can simply remain out of the market untü it has mechanics that bear significant relation to our partial model. Positions are entered when the current market behaviour is recognised as having some deterministic component.

One method o f finding these pockets of predictability is through the use o f genetic algorithm rule induction. This has already been covered in Chapter 2, so only an overview of the system will be given here.

4.4.1 Representation

In order to evolve rules describing successful charting trades, it is clearly necessary to devise an encoding that wül allow the GA to explore the space o f chart trades. In the previous sections, the effects o f individual conditions upon the probability o f success

were examined: for example a trade is much more likely to work if the trade is a very good example. To enable the GA to explore multi-variate conditions, a framework must be provided that will allow the GA to formulate such queries. If it is acceptable to restrict the GA to a fixed number o f variables or conditions on any given execution, then the action of the genetic operators is as already described in Chapter 2, and the encoding scheme is straightforward:

Figure 4.3: GA Encoding Scheme

Variable 1 id Variable2 id Variable n id

Variable 1 value Variable2 value Variable n value

The number of parameters for each rule is specified in advance for each run o f the GA. Each gene has a pair o f values:

1. Variable ID: This is a symbolic representation o f a particular observable. Possible values would decode to Type, Pattern, Pattern Duration, Good Example, Days to Confirmed Entry Point etc.

2. Variable value: This is the condition that the variable above must have for the rule to fire. Some variables, such as Days to Confirmed Entry Point are real-valued, and so these cannot be coded in the same way as an observable that has symbolic values like “BuU” or “Bear”. Observables that are real-valued are clustered into 5 bins using the k-means algorithm[Hart75].

This encoding scheme has restricted the GA to search for rules that have a prespecified number o f conditions. This is not a problem as the GA can simply be executed several times, each time searching for rules with a different number o f parameters.

4.4.2 Rule Evaluation

Once the GA has assembled a rule, it needs to be evaluated in order for the selection process in the evolutionary cycle to operate. This is achieved by examining the data for trades where all the rules’ contingent clauses would be met by the description o f a past trade. When a match is found, the success rate of the GA’s rule is updated fi-om the success or failure of the trade in the data that matched the rule. If no trades can be

found where all the conditions are met, the rule’s fitness is undefined and the rule is eliminated. Otherwise, the fitness of the rule is given by the Z-score, which is a combination of the number of activations and the improved probability o f a successful trade outcome. In this way, the GA searches for rules that both fire fi'equently and have a high conditional probability o f success.

4.4.3 Genetic Algorithm Execution

The GA used to evolve charting rules had the following properties:

1. A “simple” GA was used with tournament selection, to search for rules with a randomly chosen number o f conditions.

2. A population o f 50 was used for 20 generations.

3. Ehtism was used to ensure that the fittest individual always survives through to the next round.

4. A level o f mutation (probabihty per cycle per individual o f 0.1) was used to assist the GA in exploring the space.

5. After the 20 passes through the evolutionary cycle, the fittest individual is output. The GA had usually converged(save for mutation) or was nearing convergence by this stage.

6. The GA would then re-execute itself. The net result o f this was that a number of rules would be evolved up with varying numbers o f clauses in the rules.

7. Rules were then tested on the vahdation set, and any that failed to fire were removed. The remaining rules were then tested on the out-sample data.

4.4.4 Results

The rules that have been found by the GA on the in-sample and vahdation sets are tested on out-sample data that is disjoint from data used in the construction o f the rules. Again, a table of rules of the form “Pattern: TTR; Subtype: 4; Context: D ES” etc., presents httle useful information. The most useful way o f presenting the results is to show the distributions of Z-scores for rules with a specific number o f parameters. (Figure 4.4). Rules that failed to fire out-of-sample have been ignored.

Figure 4.4: Distribution of Z-scores for rules on out-sample data

Z-Score

No. of Parameters

Figure 4.4 and Table 4.8 clearly show that larger numbers of higher Z-scores are obtained with the rules with fewest variables. The mean Z-score for 5 parameter rules is greater than the mean Z-scores for the 3 and 4 parameter rules, but the sample size is only 2 and the significance of this result is minimal.

Table 4.8: Mean out-sample Z-scores

No. of Param eters No. of Rules M ean Z-Score

2 30 1.43

3 18 0.83

4 9 0.71

5 2 1.01

This mean Z-score for 2 parameter rules is quite high, but not high enough to reject the null hypothesis at the 95% confidence level. One of the best intelligible rules found by the system was "Long Term Trend = FOR; Good Example = 4", which has a certain intuitive appeal.