CHAPTER FIVE
LR Models ROC
2. Batch mode
In the on-line mode the error function is calculated after the presentation of the input pattern and the error signal is propagated back through the network modifying the weights before the presentations of the next pattern. The error function is generally the Mean Square Error of the difference between the desired and the actual responses of the network. All such presentations of such patterns is usually called as an epoch or one iteration. In batch mode the weights are
modified only when the input pattern have been presented. Then the error function is calculated as the sum of the individual MSE for each of the input pattern and weights are modified
accordingly before the next iteration.
6.5.4 Error functions
If a pattern is submitted and its classification or association is determined to be erroneous, the synaptic weights as well as the thresholds are adjusted so that the current least mean square classification error is reduced. The input - output mapping, comparison of target and actual values, and adjustment, if needed, continue until all mapping examples from the training set are learned within an acceptable overall error. Usually, mapping error is cumulative and computed over the full training set. Error is the measure of the discrepancy between the neural network output and the target. The most popular error functions are sum of squares (SSE) and cross entropy (CE) among others.
6.5.5 Advantages of Multilayer Perceptrons
The general characteristics of multilayer perceptrons are generalization and fault tolerance.
139
Generalization: Neural networks are capable of classifying unknown patterns with the support of
known patterns that have some different level of features. This means incomplete inputs will be classified because of their similarity with complete inputs.
Fault Tolerance: Neural networks are highly fault tolerant. This characteristic feature can also be
termed as “graceful degradation” (118). Hence the neural networks keep on working even if
some interconnections between some neurons fail.
6.5.6 Limitations of Multilayer Perceptrons
There are limitations to the feed forward, back propagation architecture. Back-
propagation requires a lot of supervised training, with lots of input-output examples. Sometimes, the learning can get stuck in local minima, limiting the best solution. This occurs when the network systems finds an error that is lower than the surrounding possibilities but does not finally gets to the smallest possible error. In typical feed forward, back-propagation applications, the desired output may not be known precisely. In such case the back propagation learning cannot be used directly. Examples like include speech synthesis from the text robot arms, evaluation of bank loans, image processing etc.
6.5.7 ANN Modeling
Neural networks are undoubtedly powerful nonlinear function estimators. As mentioned earlier there are several types of ANN architectures. They usually perform prediction tasks at least as well as other techniques, if not significantly better. Additionally, building an ANN requires minimum domain knowledge in the areas of mathematics and statistics, than does for building a logistic regression model. The ANN type used in this study is called a multilayer perceptron (MLP) or multilayer feed forward network, which propagates input signals forwards and error signals backwards. During the process, the weights are adjusted so that the output
140
grows more accurate. This process is prone to over fitting problems. In order to avoid over fitting, a common technique is to train the network with some portion of the data values, and then evaluate its performance by testing the trained network with the remaining data values. In our ANN modeling we used 70% data for training and remaining 30% data for testing.
The four ANN models consisted of an input layer, a hidden layer and an output layer. Table 6.3 summarizes the specificity, sensitivity and overall accuracy results of the ANN models when training. Table 6.4 summarizes the specificity, sensitivity and overall accuracy results of the ANN models when testing the trained model. Table 6.5 has the ROC area values for the four ANN models. Since training is the key factor for an ANN model, here we will be discussing about training results of ANN models. Even in this case, the results showed that the overall accuracy jumps from 71.12% for model-3 to 82.80% for model-4 for the same reason as
mentioned earlier. The ANN model-1 yielded a ROC area of 72.1%, and sensitivity to survival of 95% gave a specificity of only 31%, model-2 with a ROC area of 73.1% and sensitivity to
survival of 95% has a specificity of 32%. For the remaining two models the ROC area is 73.8% and 87.4% respectively and the sensitivity to survival of 95% has a specificity of 39% and 66% respectively. Comparing these results with logistic models, at a 95% sensitivity, ANN has a better specificity for all the four models. Table 6.5 gives the details about architecture and ROC area of ANN models and their respective ROC graphs of the four ANN models are given in Figure 6.7. The output of ANN will be a composite function of the form
𝑦𝑖 = 𝑓 {∑ 𝑡𝑎𝑛ℎ (∑(⦁))} ; 𝑖 = 0,1;
141
Table 6.3 Sensitivity, specificity and overall results of ANN training
ANN