• No results found

4.3 Geneti Algorithm

4.3.4 Geneti Algorithm for Multi-Obje tive Optimization

tion

The main idea in MO problems is to nd the global pareto-optimal front.

Of ourse, this annot be guaranteed, but the algorithms designed for this

problemmust havetwo properties: generatingsolutions along other pareto-

optimalfronts and nding new fronts. In the reprodu tionphase, it is om-

mon to generate solutions that are dominated by other individuals in the

population. Thesehavetobedis arded sin ethey areof nointerest. Theal-

gorithmusedfor multi-obje tiveoptimizationinthe xed s alingproblemis

the ElitistNon-Dominated Sorting Geneti Algorithm (NSGA-II) proposed

in[34℄.

NSGA-II algorithm works in two basi steps: sorting of solutions based on

dominan e and elitism sele tion to keep the best fronts en ountered. Sin e

the population size does not hange, the last front for in lusion has to be

dividedintwoparts. Thedivision isdonetokeep the mostdiverse solutions

for the next generations. This diversity in NSGA-II is al ulated using the

rowding distan e, whi h measures the distan e between solutions in the

obje tive fun tion spa e. With this approa h, the algorithm ex ludes the

sharingparameter,whi hisresponsiblefor al ulatingtheproximitybetween

population members and has to be dened by user. Sorting of solutions is

done by areful book-keeping in order to speed up the exe ution time. For

details refer to [34℄. The overall omplexity of the algorithm is

O(mp

2

)

,

where

m

is the number of obje tives (in our ase

m = 2

) and

p

is the size of the population. One thing worth mentioningis the onstant fa tor inthe

mentionedasymptoti runningtime. Thealgorithmworks by sortingonthe

set of both the urrentpopulationand the ospring, doublingthesize ofthe

set. Withthistakenintoa ount,themorepre ise omplexityis

O(m(2p)

2

)

Experiments: Sear h Algorithms

Theexperimentswere arriedoutonavarietyof omputerar hite turesand

dierent setups, but MATLAB was used as the main environment to run

the experiments. In this hapter we ompare the performan e of the sear h

algorithmsexplainedin Chapter 4 onseveral regression data sets.

A number of data sets with varying number of samples and dimensionality

wasused totest the quality of the ompositionof Delta Test with the three

mentionedsear h algorithms. The following datasets were used for ompar-

ing the performan e of FBS, TS and GA, with Table 5.1 summarizing the

sizes of alldata sets.

1. Housingdataset [67℄: Thehousingdatasetisrelatedtothe estimation

ofhousingvaluesinsuburbs ofBoston. The valuetopredi tistheme-

dian value of owner-o upied homesin $1000's. The data set ontains

506 instan es, with 13input variablesand one output.

2. Te ator data set [68℄: The Te ator data set aims at performing the

task of predi ting the fat ontent of a meat sample onthe basis of its

near infrared absorban e spe trum. The data set ontains 215 useful

instan es forinterpolationproblems, with 100 input hannels, 22prin-

ipal omponents(whi h remainunused) and3outputs, althoughonly

one is goingto be used (fat ontent).

3. Anthrokidsdataset[69℄: Thisdatasetrepresentstheresultsofathree-

year study on3900 infantsand hildrenrepresentativeof theU.S.pop-

ulation of year 1977, ranging in age from newborn to 12 years of age.

The data set omprises 121 variables with the weight of a hild being

priorsample and variable dis riminationhadto be performedto build

a robust and reliable data set. The nal set without missing values

ontains 1019 instan es, 53 input variables and one output (weight).

Moreinformationonthisdataset redu tionmethodology anbefound

in[63℄.

4. TheSantaFetime series ompetitiondata set [70℄: TheSantaFedata

set is a time series re orded from laboratory measurements of a Far-

Infrared-Laserina haoti state,andproposedforatime series ompe-

titionin 1994. The set ontains 1000samples, and itwas reshaped for

itsappli ation totime seriespredi tion using regressors of 12samples.

Thus, the set used in this work ontains 987 instan es, 12 inputs and

one output.

5. ESTSP 2007 ompetition data set [71℄: This time series was proposed

forthe European Symposium onTime Series Predi tion 2007. Itis an

univariateset ontaining 875 samples, while the regressor size for this

seriesvariedfordierentsetofexperimentsasexplainedinthis hapter

and the next one.

Dataset Samples Input variables

Boston Housing 506 13

Anthrokids 1019 53

Te ator 215 100

Santa Fe 987 12

ESTSP2007 819 55

Table 5.1: Datasets used for testingthe performan e of sear h algorithms.

5.1 Approximate Nearest Neighbor Inuen e

First we show the importan e of using faster nearest neighbor sear h when

optimizingtheDT.Table5.2showstheaveragerunningtimesfortheGeneti

Algorithm for Santa Fe, ESTSP 2007 and Anthrokids data sets. As an be

seen,the omputationalsavingsfromusingunderlyingdatastru tureinANN

is substantial, with improvement of 80% for Santa Fe and roughly 90% for

Data set Naive sear h Approximate

k

-NN

Santa Fe 620 124

ESTSP 2007 2573 283

Anthrokids 2938 314

Table 5.2: Averagerunningtime inse onds forDT optimizationusing naive

NN approa hand approximate

k

-NN sear h.

Related documents