• No results found

Normalizing CSV Files

If the data to be normalized is already stored in CSV files, Encog Analyst should be used to normalize the data. Encog Analyst can be used both through the Encog Workbench and directly from Java and C#. This section explains how to use it through C# to normalize the Iris data set.

To normalize a file, look at the file normalization example found at the following location:

Encog . Examples . Normalize . NormalizeFile

To execute this example, use the following command. ConsoleExamples normalize−f i l e [ raw ] [ normalized ]

This example takes an input and output file. The input file is the iris data set. The first lines of this file are shown here:

” s e p a l l ” , ” sepal w ” , ” p e t a l l ” , ” petal w ” , ” s p e c i e s ” 5 . 1 , 3 . 5 , 1 . 4 , 0 . 2 , I r i s −s e t o s a 4 . 9 , 3 . 0 , 1 . 4 , 0 . 2 , I r i s −s e t o s a 4 . 7 , 3 . 2 , 1 . 3 , 0 . 2 , I r i s −s e t o s a 4 . 6 , 3 . 1 , 1 . 5 , 0 . 2 , I r i s −s e t o s a 5 . 0 , 3 . 6 , 1 . 4 , 0 . 2 , I r i s −s e t o s a 5 . 4 , 3 . 9 , 1 . 7 , 0 . 4 , I r i s −s e t o s a 4 . 6 , 3 . 4 , 1 . 4 , 0 . 3 , I r i s −s e t o s a 5 . 0 , 3 . 4 , 1 . 5 , 0 . 2 , I r i s −s e t o s a

” s e p a l l ” , ” sepal w ” , ” p e t a l l ” , ” petal w ” , ” s p e c i e s ( p0 ) ” , ” s p e c i e s ( p1 ) ” −0.55 ,0.24 , −0.86 , −0.91 , −0.86 , −0.5 −0.66 , −0.16 , −0.86 , −0.91 , −0.86 , −0.5 −0.77 ,0 , −0.89 , −0.91 , −0.86 , −0.5 −0.83 , −0.08 , −0.83 , −0.91 , −0.86 , −0.5 −0.61 ,0.33 , −0.86 , −0.91 , −0.86 , −0.5 −0.38 ,0.58 , −0.76 , −0.75 , −0.86 , −0.5 −0.83 ,0.16 , −0.86 , −0.83 , −0.86 , −0.5 −0.61 ,0.16 , −0.83 , −0.91 , −0.86 , −0.5

The above data shows that the numeric values have all been normalized to between -1 and 1. Additionally, the species field is broken out into two parts. This is because equilateral normalization was used on the species column.

2.4.1

Implementing Basic File Normalization

In the last section, you saw how Encog Analyst normalizes a file. In this section, you will learn the programming code necessary to accomplish this. Begin by accessing the source and target files:

var s o u r c e F i l e = new F i l e I n f o ( app . Args [ 0 ] ) ; var t a r g e t F i l e = new F i l e I n f o ( app . Args [ 1 ] ) ;

Now create instances of EncogAnalyst and AnalystWizard. The wizard will analyze the source file and build all of the normalization stats needed to perform the normalization.

var a n a l y s t = new EncogAnalyst ( ) ;

var wizard = new AnalystWizard ( a n a l y s t ) ; The wizard can now be started.

wizard . Wizard ( s o u r c e F i l e , true , AnalystFileFormat . DecpntComma) ; Now that the input file has been analyzed, it is time to create a normalization object. This object will perform the actual normalization.

var norm = new AnalystNormalizeCSV ( ) ;

It is necessary to specify the output format for the CSV, in this case, use ENGLISH, which specifies a decimal point. It is also important to produce output headers to easily identify all attributes.

norm . ProduceOutputHeaders = true ; Finally, we normalize the file.

norm . Normalize ( t a r g e t F i l e ) ;

Now that the data is normalized, the normalization stats may be saved for later use. This is covered in the next section.

2.4.2

Saving the Normalization Script

Encog keeps statistics on normalized data. This data, called the normalization stats, tells Encog the numeric ranges for each attribute that was normalized. This data can be saved so that it does not need to be renormalized each time.

To save a stats file, use the following command: a n a l y s t . Save (new F i l e I n f o ( ” s t a t s . ega ” ) ) ;

The file can be later reloaded with the following command: a n a l y s t . Load (new F i l e I n f o ( ” s t a t s . ega ” ) ) ;

The extension EGA is common and stands for “Encog Analyst.”

2.4.3

Customizing File Normalization

The Encog Analyst contains a collection of AnalystField objects. These ob- jects hold the type of normalization and the ranges of each attribute. This collection can be directly accessed to change how the attributes are normal- ized. Also, AnalystField objects can be removed and excludes from the final output.

The following code shows how to access each of the fields determined by the wizard.

Console . WriteLine (@” F i e l d s found in f i l e : ” ) ; f o r e a c h ( AnalystField f i e l d in a n a l y s t . S c r i p t . Normalize . NormalizedFields ) { var l i n e = new S t r i n g B u i l d e r ( ) ; l i n e . Append ( f i e l d . Name) ; l i n e . Append ( ” , a c t i o n=” ) ; l i n e . Append ( f i e l d . Action ) ; l i n e . Append ( ” , min=” ) ; l i n e . Append ( f i e l d . ActualLow ) ; l i n e . Append ( ” ,max=” ) ; l i n e . Append ( f i e l d . ActualHigh ) ; Console . WriteLine ( l i n e . ToString ( ) ) ; }

There are several important attributes on each of the AnalystField objects. For example, to change the normalization range to 0 to 1, execute the following commands:

f i e l d . NormalizedHigh = 1 ; f i e l d . NormalizedLow = 0 ;

The mode of normalization can also be changed. To use one-of-n normalization instead of equilateral, just use the following command:

f i e l d . Action = NormalizationAction . OneOf ;

Encog Analyst can do much more than just normalize data. It is also performs the entire normalization, training and evaluation of a neural network. This will be covered in greater detail in Chapters 3 and 4. Chapter 3 will show how to do this from the workbench, while Chapter 4 will show how to do this from code.