• No results found

Problem Solving and Optimization

Both evolution and mind are concerned with a certain kind of business, and that is the business of finding patterns that satisfy a complex set of constraints. In one case the pattern is a set of features that best fit a

dynamic ecological niche, and in the other case it is a pattern of beliefs, attitudes, and behaviors that minimize conflict with personal, social, and physical constraints. Some propensities to acquire particular mental qualities seem to be inherited, though the topic of individual differences in the various propensities (traits such as intelligence, introversion, cre- ativity, etc.) is still the subject of lively controversy. While a tendency to learn, or to learn in a particular way, may be inherited, learning can only take place within a lifetime; there is no genetic transfer of learned knowl- edge from one generation to another. As we have seen, the two great stochastic systems have different ways of going about their business, though both rely heavily, if not exclusively, on some version oftrial and

error.In this section we discuss some aspects of problem solving where

the goal is to fit a number of constraints as well as possible.

It may seem that we use the word “problem” in an unusual way. In- deed, the word has specialized meanings to specific groups who use it. We use it to describe situations—not necessarily mathematical ones (they might be psychological, ecological, economical, or states of any kind of phenomena)—where some facts exist and there is a need to find other facts that are consistent with them. A changing environment might be a problem to an evolving species. An ethical dilemma is a prob- lem to a moral person. Making money is a problem for most people. An equation with some unknowns presents a problem to a mathematician. In some cases, the facts that are sought might be higher-level facts about how to arrange or connect the facts we already have.

Optimizationis another term that has different connotations to differ-

ent people. Generally this term refers to a process of adjusting a system to get the best possible outcome. Sometimes a “good” outcome is good enough, and the search for the very best outcome is hopeless or unneces- sary. As we have defined a problem as a situation where some facts are sought, we will define optimization as a process of searching for the miss- ing facts by adjusting the system. In the swarm intelligence view of the present volume, we argue that social interactions among individuals en- able the optimization of complex patterns of attitudes, behaviors, and cognitions.

A Super-Simple Optimization Problem

A problem has some characteristics that allow the goodness of a solution to be estimated. This measurement often starts contrarily with an esti- mate of the error of a solution. For example, a simple arithmetic problem

find a value (a pattern consisting of only one element in this trivial ex- ample) forxthat results in the best performance or fitness. The error of a proposed solution in this case is the difference between the actual and desired results; where we want 4 + x to equal 10, it might be that we take a guess at whatxshould be and find that 4+xactually equals 20, for instance, if we triedx=16. So we could say that the error, whenx=16, is 10.

Error is a factor that decreases when goodness increases. An optimiza- tion problem may be given in terms of minimizing error or maximizing goodness, with the same result. Sometimes, though, it is preferable to speak in terms of the goodness or fitness of a problem solution—note the tie-in to evolution. Fitness is the measure of the goodness of a genetic or phenotypic pattern. Converting error to goodness is not always straight- forward; there are a couple of standard tricks, but neither is universally ideal. One estimate of goodness is the reciprocal of error (e.g., 1/e). This particular measure of goodness approaches infinity as error approaches zero and is undefined if the denominator equals zero, but that’s not nec- essarily a problem, as we know that if the denominator is zero we have actually solved the problem. Besides, in floating-point precision we will probably never get so close to an absolutely perfect answer that the com- puter will think it’s a zero and crash. Another obvious way to measure the goodness of a potential solution is to take the negative of the error (multiplyeby−1); then the highest values are the best.

We can use the super-simple arithmetic problem given above to dem- onstrate some basic trial-and-error optimization concepts. Our general approach is to try some solutions, that is, values forx, and choose the one that fits best. As we try answers, we will see that the search process it- self provides us with some clues about what to try next. First, if 4+xdoes not equal 10, then can we look at how far it is from 10 and use that knowledge—the error—to guide us in looking for a better number. If the error is very big, then maybe we should take a big jump to try the next potential solution; if the error is very small, then we are probably close to an answer and should take little steps.

There is another kind of useful information provided by the search process as well. If we tried a number selected at random, say, 20, forx,we would see that the answer was not very good (error is (10−24)=14). If we tried another number, we could find out if performance improved or got worse. Trying 12, we see that 4+12 is still wrong, but the result is nearer to 10 than 4+20 was, with error=6. If we went pastx=6, say, we triedx=1, we would discover that, though error has improved to 5, the sign of the difference has flipped, and we have to change direction and

go up again. Thus various kinds of facts are available to help solve a prob- lem by trial and error: the goodness or error of a potential solution gives us a clue about how far we might be from theoptimum(which may be a

minimumor a maximum). Comparing the fitness of two or more points

and looking at the sign of their difference gives usgradientinformation about the likely direction toward an optimum so we can improve our guess about which way to go. A gradient is a kind of multidimensional slope. A computer program that can detect a gradient might be able to move in the direction that leads toward the peak. This knowledge can be helpfulifthe gradient indicates the slope of a peak that is good enough; sometimes, though, it is only the slope of a low-lying hill. Thus an algo- rithm that relies only on gradient information can get stuck on a medio- cre solution.

Three Spaces of Optimization

Optimization can be thought to occur in three interrelated number spaces. Theparameter spacecontains the legal values of all the elements— called parameters—that can be entered into the function to be tested. In the exceedingly simple arithmetic problem above,xis the only parame- ter; thus the parameter space is one-dimensional and can be represented as a number line extending from negative to positive infinity: the legal values ofx.Most interesting optimization problems have higher parame- ter dimensionality, and the challenge might be to juggle the values of a lot of numbers. Sometimes there are infeasible regions in the parameter space, patterns of input values that are paradoxical, inconsistent, or meaningless.

A function is a set of operations on the parameters, and thefunction

space contains the results of those operations. The usual one-dimen-

sional function space is a special case, as multidimensional outputs can be considered, for instance, in cases of multiobjective optimization; this might be thought of as the evaluation of a number of functions at once, for example, if you were to rate a new car simultaneously in terms of its price, its appearance, its power, and its safety. Each of these measures is the result of the combination of some parameters; for instance, the as- sessment of appearance might combine color, aerodynamic styling, the amount of chrome, and so on.

Thefitness spaceis one-dimensional; it contains the degrees of success

with which patterns of parameters optimize the values in the function space, measured as goodness or error. To continue the analogy, the

fitness is the value that determines whether you will decide to buy the car. Having estimated price, appearance, power, and so on, you will need to combine these various functions into one decision-supporting quan- tity; if the quantity is big, you are more likely to buy the car. Each point in the parameter space maps to a point in the function space, which in turn maps to a point in the fitness space. In many cases it is possible to map directly from the parameter space to the fitness space, that is, to di- rectly calculate the degree of fitness associated with each pattern of pa- rameters. When the goal is to maximize a single function result, the fitness space and the function space might be the same; in function minimization they may be the inverse of one another. In many common cases the fitness and function spaces are treated as if they were the same, though it is often helpful to keep the distinction in mind. The point of optimization is to find the parameters that maximize fitness.

Of course, for a simple arithmetic problem such as 4 + x = 10 we don’t have to optimize—someone a long time ago did the work for us, and we only have to memorize their answers. But other problems, espe- cially those that human minds and evolving species have to solve, definitely require some work, even if they are not obviously numeric. It may seem unfamiliar or odd to think of people engaging in day-to-day optimization, until the link is made between mathematics and the struc- tures of real human situations. While we are talking about the rather aca- demic topic of optimizing mathematical functions, what is being said also applies to the dynamics of evolving species and thinking minds.

Fitness Landscapes

It is common to talk about an optimization problem in terms of afitness

landscape.In the simple arithmetic example above, as we adjusted our

parameter up and down, the fitness of the solution changed by degree. As we move nearer to and farther from the optimumx=6, the goodness of the solution rises and falls. Conceptually, this equation with one un- known is depicted as a fitness landscape plotted in two dimensions, with one dimension for the parameter being adjusted and the second dimen- sion plotting fitness (see Figure 2.6). When goodness is the negative of er- ror, the fitness landscape is linear, but when goodness=1/abs(10−(4+ x)), it stretches nonlinearly to infinity at the peak. The goal is to find the highest point on ahillplotted on they-axis, with thepeakindicating the point in the parameter space where the value ofxresults in maximum fitness. We find, as is typical in most nonrandom situations, that solu- tions in theregionof a global optimum are pretty good, relative to points

in other regions of the landscape. Where the function is not random— and therefore impossible to optimize—a good optimization algorithm should be able to capitalize on regularities of the fitness landscape.

Multimodal functions have more than one optimum. A simple one-

dimensional example is the equationx2=100. The fitness landscape has a peak atx= 10 and another at x= −10. Note that there is a kind of bridge or “saddle” between the two optima, where fitness drops until it gets tox=0, then whichever way we go it increases again. The fitness of x=0 is not nearly so bad as, say, the fitness of x=10,000. No matter where you started a trial-and-error search, you would end up finding one of the optima if you simply followed the gradient.

John Holland has used a term that well describes the strategic goal in finding a solution to a hard problem in a short time; the issue, he said, is “the optimal allocation of trials.” We can’t look everywhere for the an- swer to a problem; we need to limit our search somehow. An ideal algo- rithm finds the optimum relatively efficiently.

Two basic approaches can be taken in searching for optima on a fitness landscape.Explorationis a term that describes the broad search for a relatively good region on a landscape. If a problem has more than one

Fi tness 6 5 4 3 2 1 0 7 8 9 10 11 Value ofx Fitness peak

Figure 2.6 Fitness landscapes for the one-dimensional functionx+4=10, with goodness defined two different ways.

optima. Normally we prefer to find the global optimum, or at least the highest peak we can find in some reasonable amount of time. Explora- tion then is a strategic approach that samples widely around the land- scape, so we don’t miss an Everest while searching on a hillside. The more focused way to search is known as exploitation. Having found a good region on the landscape, we wish to ascend to the very best point in it, to the tip-top of the peak. Generally, exploitation requires smaller steps across the landscape, in fact they should often decrease as the top of a peak is neared. The most common exploitational method is hill

climbing,in which search proceeds from a position that is updated when

a better position is found. Then the search can continue around that new point, and so on. There are very many variations on the hill-climbing scheme. All are guaranteed to find hilltops, but none can guarantee that the hill is a high one. The trade-off between exploration and exploitation is central to the topic of finding a good algorithm for optimization: the optimal allocation of trials.

The examples given above are one-dimensional, as the independent variable can be represented on a single number line, using they-axis to plot fitness. A more complex landscape exists when two parameters affect the fitness of the system; this is usually plotted using a three- dimensional coordinate system, with thez-axis representing fitness; that is, the parameters or independent variables are plotted on a plane with the fitness function depicted as a surface of hills and valleys above the plane. Systems of more than two dimensions present a perceptual dif- ficulty to us. Though they are not problematic mathematically, there is no way to graph them using the Cartesian method. We can only imagine them, and we cannot do that very well.

The concepts that apply to one-dimensional problems also hold in the multidimensional case, though things can quickly get much more complicated. In the superficial case where the parameters are indepen- dent of one another, for example, where the goal is to minimize the sphere function f xi( )=

xi,

2

then the problem is really just multiple (and simultaneous) instances of a one-dimensional problem. The solu- tion is found by moving the values of all thexi’s toward zero, and reduc- ing any of them will move the solution equally well toward the global optimum. In this kind of case the fitness landscape looks like a volcano sloping gradually upward from all directions toward a single global opti- mum (see Figure 2.7). On the other hand, where independent variables interact with one another, for example, when searching for a set of neural network weights, it is very often the case that what seems a good position on one dimension or subset of dimensions deteriorates the optimality of

values on other dimensions. Decreasing one of thexi’s might improve performance if and only if you simultaneously increase another one and deteriorates goodness otherwise. In this more common case the fitness landscape looks like a real landscape, with hills and valleys and some- times cliffs.