• No results found

Perceptron generation using the NerualNetwork DLL

4.5 Classifier Generation Service (CGS)

4.5.5 Perceptron generation using the NerualNetwork DLL

The NeuralNetwork DLL, as discussed in Section 4.5.3, acts as a demonstration imple-

mentation that uses the Lib DLL. The only requirement of this library is that a Percep-

tronInducer interface is implemented. The reason for this is that every implementation and set of requirements are different. Within this DLL this implementation is called the

4.5. CLASSIFIER GENERATION SERVICE (CGS) 49

OnlinePerceptronInducer and is intended to load training data from files, generate an ANN and then to store this ANN in the form of an update exposed via a web service that is accessible by client consumers.

The OnlinePerceptronInducer is instantiated by the program and initialised with an in-

stance of the NetworkTrainingConfiguration class. It is from this class that intended

implementations of the library’s interfaces are meant to be accessed. Once this class has

been initialised, thegenerateNetwork method is called and follows the following high-level

algorithm:

1. Extract the BOW and lexical features of URLs contained in the data files and save them as a list of input vectors for the classifier.

2. Normalise this list of inputs to a range of zero to one for each continuous variable within the lexical features portion of each input vector.

3. Train the classifier using these normalised lists and a second set of normalised lists used for validation. These lists are generated by splitting the first list.

4. Persist the classifier. This is done by sending it to the update service described in Section 4.6.

Extraction of data is done through the use of the NetworkDataFetcher interface which is

available through the ntd member variable of the configuration instance. This interface

provides a single method calledfetchDataArray which loads the data and returns it as a

single array element per data item. Within this implementation, each data set is loaded through its own instance of this interface implementation and loads the data from files specified as paths in the configuration file. Once the training and validation data have been loaded and balanced using random under-sampling (as discussed in Section 2.4.3),

they are used to create instances ofURLExtractor which will be used as an input to the

classifier during training and validation. Each of the BOWs are then generated using the collection these extractors and are then used to insert a map of existing words into each of the extractors as part of the input data.

After extraction is completed, normalisation of the training data begins. Normalisation

is handled by an object called the DescriptorNormaliser which is implemented in the

library. Like all other objects within this application, a reference to this object is found in the configuration instance and is initialised with all the descriptors (extractors) already

loaded. Once the normaliser is initialised, it is used to normalise both the malicious and benign extractor lists to a range of 0 to 1 for each continuous field.

At this point, the inducer which is an object which has the specific purpose of generating (inducing) an ANN, has all the data that is required and is in the correct format. As already mentioned, this implementation uses the Online Perceptron model to create an ANN capable of classifying malicious URLs. This training method is discussed in Chapter

3. The only addition to this algorithm is that theLogger interface is used to create user

interface outputs as an indication of progress and accuracy during the training process.

After each iteration through the training data set, theNetworkValidator instance is used

to validate the ANNs accuracy using independent data known as the validation data set. After validation, a copy of the classifier and its performance are stored. Once the inducer exhausts the number of allowed epochs or reaches the ANN’s goal accuracy, training ceases and the validator’s best performing ANN is chosen as the final trained ANN.

The final step after training is to store the ANN and make it available to end-users to use

within applications implementing this classification method. A NetworkDetails object is

created and stores several statistical metrics regarding the classifier which are obtained

from the ANN validator. An instance of NetworkPersistor is fetched from the configu-

ration and is passed the configuration, network details object, the classifier, the BOW,

weights and normalisation data through thepersist method. Like the other objects within

this library, this instance is implementation-specific and can be implemented through any method. Within this research it is implemented as a series of HTTP POST (an HTTP

verb used to add data to a service5) requests to the REST service which adds persistence

to the framework. Client implementations can access this REST service to request up- dates. Each of these data sets is transmitted in this fashion as they are required by clients to rebuild the classifier exactly as it was trained; with the same weighting as well as the method by which to build input vectors for the classifier from requested URLs.

Related documents