Symbolic Regression—Error-Driven Evolution
10.7 Differential Equations
Genetic programming can be used to solve an equation whose solution consists of a function that satisfies the given equation. In particular, genetic programming can be used to solve differential equations (with given initial conditions), integral equations, general functional equations, and inverse problems. In each case, the result produced by genetic programming is a mathematical expression in symbolic form. A differential equation is an equation involving one or more derivatives (of some order) of an unknown function. The solution to a differential equation is a function that, when substituted into the given equation, satisfies the equation and any given initial conditions. Differential equations are the most familiar functional equations.
It is possible, using exact analytic methods, to find the exact function that solves some differential equations. However, for most differential equations, only numerical approximations are available.
The problem of solving a differential equation may be viewed as the search in a space of compositions of functions and terminals for a particular composition that satisfies the equation and its initial conditions. Once the problem of solving differential equations is reformulated in this way, the problem is an immediate candidate for solution by genetic programming.
The approach involves an extension of the already-described techniques for symbolic integration and differentiation (which are, of course, based on symbolic regression).
Without loss of generality, we will assume that every equation in the remainder of this chapter has been transformed so that its right-hand side is 0.
10.7.1 Example 1
Consider the simple differential equation
where yinitial = 1.0 for xinitial of 0.0.
The terminal set and the function set for this problem are chosen in the same way as for symbolic integration.
We start by generating 200 random values of the independent variable xiover some appropriate domain, such as the unit interval [0, 1]. We then sort these values into ascending order.
Page 265 We are seeking a function f(x) such that, for every one of the 200 values xi of the variable x, we get 0 when we perform the following
computation: For each i, add the derivative f'(xi) at the point xi (i.e., dy/dx) to the product of f(xi) at point xi (i.e., y) and the cosine of xi. This
rewording of the problem immediately suggests an orderly general procedure for genetically finding the function f(x) that satisfies the given differential equation.
Given the set of 200 ascending values of xi, we define a ''curve resulting from applying the function g'' to be the 200 pairs (xi, g(xi)), where g is
some function.
When the jth genetically produced function fj in the population (i.e., S-expression) is generated by genetic programming, we apply this
function (i.e., S-expression) fj to generate a curve. Specifically, we obtain 200 values of fj(xi) corresponding to the 200 values of xi. We call
these 200 pairs (xi, fj(xi)) the "curve resulting from applying the genetically produced function fi" or "the fi curve."
We then numerically differentiate this curve (xi, fj(xi)) with respect to the independent variable xi. That is, we apply the function of
differentiation to obtain a new curve. Specifically, we obtain a new set of 200 pairs (xi, fj'(xi)) which we call the "curve resulting from applying
the differentiation function" or "the derivative curve."
We then apply the cosine function to obtain yet another curve. Specifically, we take the cosine of the 200 random values of xi to obtain a new set of 200 pairs (xi, Cos xi), which we call the "curve resulting from applying the cosine function" or "the cosine curve."
We then apply the multiplication function to the cosine curve and the fi curve to obtain still another curve which we call "the product curve."
In particular, we multiply the curve consisting of the set of 200 pairs (xi, Cos xi) by fj(xi) so as to obtain a new curve, called "the product
curve," consisting of the set of 200 pairs (xi, fj(xi)* Cos xi).
We then apply the addition function to the derivative curve and the product curve to obtain a curve consisting of the set of 200 pairs (xi, fj'(xi) + fj(xi)* Cos xi), which we call "the sum curve."
To the extent that the sum curve is close to the "zero curve" consisting of the 200 pairs (xi, 0) (i.e., the right-hand side of the differential
equation) for the 200 values of xi, the genetically produced function fj is a good approximation to the solution of the given differential
equation.
The problem of solving the given differential equation is now equivalent, except for the matter of initial conditions, to a symbolic regression problem over the set of points (xi, fj'(xi) + fj(xi)* Cos xi).
In solving differential equations, the fitness of a particular genetically produced function should be expressed in terms of two components. The first component is how well the function satisfies the differential equation as just described above. The second component is how well the function satisfies the initial condition of the differential equation.
Since a mere linear function passing through the initial condition point will maximize this second component, it seems reasonable that the first component
Page 266 should receive the majority of the weight in calculating fitness. Therefore, we arbitrarily assign it 75% of the weight in the examples below. Specifically, the raw fitness of a genetically produced function fj is 75% of the first component plus 25% of the second component. The closer
this overall sum is to 0, the better. This division of weights creates a tension between the two factors that can be fully satisfied only by a correct solution to the differential equation that also satisfies the initial condition. One can view the initial condition as a constraint with 25/75 x 200 as the penalty coefficient for the penalty function used to handle the constraint.
The first component used in computing the raw fitness of a genetically produced function fj is the sum, for i between 0 and 199, of the absolute values of the differences between the zero function (i.e., the right-hand side of the equation) and fj'(xi) + fj(xi)* Cos xi, namely
Since the difference is taken with respect to the zero function, this sum of differences is merely the sum of the absolute values of the left-hand side of the equation. The closer this sum is to 0, the better.
The second component used in computing the raw fitness of a genetically produced function fj is based on the absolute value of the difference
between the given value yinitial for the initial condition and the value of the genetically produced function fj(xinitial) for the particular given initial condition point xinitial. Since this difference is constant over all 200 points, we can simply multiply any one of these uniform differences by 200 to obtain this second component. The closer this value is to 0, the better.
Note that the initial condition should be chosen so that the zero function does not satisfy the differential equation and the initial condition; otherwise, the zero function will likely be produced as the solution by genetic programming.
A hit is defined as a fitness case for which the standardized fitness is less than 0.01. Since numerical differentiation is relatively inaccurate for the endpoints of an interval, attainment of a hit for 198 of the 200 fitness cases is one of the termination criteria for this problem.
Table 10.8 summarizes the key features of example 1 of the differential equations problem.
We now apply the above method to solving the given differential equation. In one run, the best-of-generation individual in the initial random population (generation 0) was, when simplified, equivalent to
e1 - ex. Its raw fitness was 58.09. Only 3 of the 200 points were hits.
By generation 2, the best-of-generation S-expression in the population was, when simplified, equivalent to el - eSin x.
Its raw fitness was 44.23. Only 6 of the 200 points were hits.
Page 267 Table 10.8 Tableau for differential equations.
Objective: Find a function, in symbolic form, which, when substituted into the given differential equation, satisfies the differential equation and which also satisfies the initial conditions.
Terminal set: X.
Function set: +, -, *, %, SIN, COS, EXP, RLOG.
Fitness cases: Randomly selected sample of 200 values of the independent variable xiin some interval of
interest.
Raw fitness: The sum, taken over the 200 fitness cases, of 75% of the absolute value of the value assumed by the genetically produced function fj(xi) at domain point xi plus 25%of 200
times of the absolute value of the difference between fj(xinitial) and the given value yinitial. Standardized fitness: Same as raw fitness for this problem.
Hits: Number of fitness cases for which the standardized fitness is less than 0.01.
Wrapper: None.
Parameters: M = 500. G = 51.
Success predicate: An S-expression scores 198 or more hits.
By generation 6, the best-of-generation S-expression in the population was, when simplified, equivalent to e-Sin x
The raw fitness of this best-of-generation individual is a mere 0.057. As it happens, this individual scores 199 hits, thus terminating the run. This best-of-run individual is, in fact, the exact solution to the differential equation.
The following three abbreviated tabulations of intermediate values for the best-of-generation individuals from generations 0, 2, and 6 will further clarify the above process.
In each simplified calculation, we use only five equally spaced xi points in the interval [0, 1], instead of 200 randomly generated points. These
five values of xi are shown in row 1.
Table 10.9 shows this simplified calculation as applied to the best-of-generation individual from generation 0, namely el - ex.
Row 2 shows the value of this best-of-generation individual from generation 0 for the five values of xi. Row 3 shows the cosine of each of the
five values of xi. Row 4 is the product of row 2 and row 3 and equals y* Cos xi for each of the five values of xi.
Page 268 Table 10.9 Simplified calculation for the best-of-generation individual from generation 0
for example 1 of the differential equations problem.
1 xi 0.0 0.25 0.50 0.75 1.0 2 y = e1 - ex 1.00 0.753 0.523 0.327 0.179 3 Cos xi 1.00 0.969 0.876 0.732 0.540 4 y* Cos xi 1.00 0.729 0.459 0.239 0.097 5 -0.989 -0.955 -0.851 -0.687 -0.592 6 0.011 -0.225 0.392 -0.447 -0.495
Table 10.10 Simplified calculation for the best-of-generation individual from generation 2 for example 1 of the differential equations problem.
1 xi 0.0 0.25 0.50 0.75 1.0 2 y = el - eSin x 1.00 0.755 0.541 0.376 0.267 3 Cos xi 1.00 0.969 0.878 0.732 0.540 4 y* Cos xi 1.00 0.732 0.474 0.275 0.144 5 -0.979 -0.919 -0.758 -0.547 -0.437 6 0.021 -0.187 -0.283 -0.271 -0.292
Row 5 shows the numerical approximation to the derivative
for each of the five values of xi. For the three xi points that are not endpoints of the interval [0, 1], this numerical approximation to the
derivative is the average of the slope to the left of the point xi and the slope to the right of the point xi. For the two endpoints of the interval
[0, 1], the derivative is the slope to the nearest point.
Row 6 is the sum of row 4 and row 5 and is an approximation to the value of the left-hand side of the differential equation for the five values of xi. Recall that if the S-expression were a solution to the differential equation, every entry in row 6 would be 0 or approximately 0 (to match
the right-hand side of the equation). Of course, this best-of-generation individual from generation 0 is not a solution to the differential equation, and therefore the entries in row 6 are all nonzero.
Table 10.10 shows this simplified calculation as applied to the best-of-generation individual from generation 2, namely el - eSin x.
Rows 1 through 5 are calculated using this best-of-generation individual from generation 2 in the same manner as above. Again, row 6 is an approxima-
Page 269 Table 10.11 Simplified calculation for the best-of-generation individual from generation 6 for
example 1 of the differential equations problem.
1 xi 0.0 0.25 0.50 0.75 1.0 2 y = e-Sin x 1.0 0.781 0.619 0.506 0.431 3 Cos xi 1.0 0.969 0.878 0.732 0.540 4 y* Cos xi 1.0 0.757 0.543 0.370 0.233 5 -0.877 -0.762 -0.550 -0.376 -0.299 6 0.123 -0.005 -0.007 -0.006 -0.067
tion to the value of the left-hand side of the differential equation for the five values of xi. The sum of the absolute values of the three non-
endpoint values of row 6 is 0.74. Their average magnitude is 0.247. If we multiply this number by 200, we get 49.4. This value is close to the more accurate raw fitness of 44.23 obtained above with 200 points even though we are using only five xi points here (instead of 200) and the ∆x here is 0.25 (instead of an average of only 0.005). Of course, this best-of-generation individual from generation 2 is not a solution to the differential equation and therefore the entries in row 6 of this table are not close to 0.
Table 10.11 shows this simplified calculation as applied to the best-of-generation individual from generation 6, namely e-Sin x
Row 6 is an approximation to the value of the left-hand side of the differential equation for the five values of xi. The three non-endpoint values
in row 6 (shown in bold) are -0.005, -0.007, and -0.006, respectively (i.e., these three non-endpoint values are each very close to 0). The appearance of these three near-zero numbers for the non-endpoint entries in row 6 indicates that the function y on row 2 of of table 10.11 is a good approximation to a solution to the differential equation. When we use the full 200 points (instead of just five), the 200 values on row 6 average a mere 0.0003 for generation 6.
Note that the three non-endpoint values of row 6 for tables 10.9 and 10.10 were not close to 0 because the functions y shown on row 2 of those two tabulations were not solutions to the differential equation.
10.7.2 Example 2
A second example of a differential equation is
with an initial condition such that yinitial = 4 when xinitial = 1.
Figure 10.19
Performance curves for example 2 of differential equations problem. In generation 28 of one run, the S-expression,
(+ (* (EXP (- X 1)) (EXP (- X 1))) (+ (+ X X) 1)), emerged. This individual is equivalent to
e-2e2x + 2x + 1, which is the exact solution to the differential equation.
Figure 10.19 presents the performance curves showing, by generation, the cumulative probability of success P(M, i) and the number of individuals that must be processed I(M, i, z) to guarantee, with 99% probability, that the left-hand side of the equation has an absolute value of less than 0.03 for all 200 fitness cases for some S-expression. The graph is based on 68 runs and a population size of 500. The cumulative probability of success P(M, i) is 48% by generation 40 and 56% by generation 50. The numbers in the oval indicate that, if this problem is run through to generation 40, processing a total of 143,500 (i.e., 500 x 41 generations x 7 runs) individuals is sufficient to guarantee solution of this problem with 99% probability.
10.7.3 Example 3
A third example of a differential equation is
with an initial condition such that yinitial = 2 when xinitial = 0.
This problem was run with a function set that included the cube root function CUBRT. In generation 13 of one run, the S-expression
(- (CUBRT (CUBRT 1))
(CUBRT (- (- (- (COS X) (+ 1 (CUBRT 1))) X) x)))
Page 271 emerged. This individual is equivalent to
1 + (2 + 2x - Cos x)1/3, which is the exact solution to the differential equation.
When the initial condition of the differential equation involves only a value of the function itself (as is typically the case when the differential equation involves only a first derivative), any point in the domain of the independent variable x may be used for the initial condition. On the other hand, when the initial condition involves a value of a derivative of the function (as may be the case when the differential equation involves second derivatives or higher derivatives), it is necessary that the value of the independent variable x involved in the initial condition be one of the points in the random set of points xi so that the first derivative (and any required higher derivative) of the genetically produced
function is evaluated for the initial condition point. In addition, it is preferable that the point xinitial be an internal point, rather than an endpoint of the domain since numerical differentiation is usually more accurate for the internal points of an interval.