The Multi-Layer Perceptron As A Pattern Classier

2.4 Pattern Recognition And Articial Neural

2.4.3 The Multi-Layer Perceptron As A Pattern Classier

This section explains why the Multi-Layer Perceptron is suited to pattern recognition problems, and thus was chosen as the architecture utilised for the experiment in this study. That is, what properties the MLP exhibits which make it suitable for such tasks. Also presented are some measures that can be undertaken to assist MLP training.

As discussed in section 2.4.2.2, the MLP rst presents an input pattern to the rst layer of hidden nodes. This layer of nodes produces output which is then passed to the next layer (either a second hidden layer or the output layer). Because each node performs the functionality of a perceptron unit, each rst layer node output denes a single plane of separation between two classes (Lippmann, 1989; Beale and Jackson, 1990).

A node in the second layer (hidden or output), receiving input from two nodes in the rst hidden layer, performs a logical AND operation if its threshold is set to `on' (with the prerequisite that both previous layer outputs are `on'). Since each of the previous layer nodes dene a linear classication in the pattern space, the receiving node produces classication based on a combination of these lines.

There may be many nodes in the rst hidden layer; thus at the second layer (hidden or output) the combination of all linear classications (corresponding to each node output from the rst hidden layer) partitions the pattern space to form a convex region (called a convex hull). A convex hull is dened as a region where any point can be connected to any other point by a straight line that does not cross the boundary of the region. Thus the convex hull comprises a region where the intersection of all linear classications occur.

If a third hidden layer were to be added, the nodes of this layer would receive convex hulls as input (from the second hidden layer nodes). Thus a third layer would provide the facility to dene arbitrary shapes. Whether a third hidden layer is added or not, it is clear that the MLP is very well suited to classifying data of a complex nature (and determining patterns within that data).

In general, MLPs exhibit the following properties or characteristics which make them ideally suited for application to the task of pattern recognition (Lippmann, 1989):

• They can solve problems that are too complex for conventional technologies; that is, problems that do not have an algorithmic solution or where an algorithmic solution is too complex to be computationally viable.

• They can be eectively applied to intelligence applications where a real-time response, using complex real-world data, is necessary.

• They can accurately associate input patterns to their correlated output patterns. The storage and summing functionality of MLP nodes facilitates their `learning' capabilities (thus simulating the biological neuron). When tested with unknown instances, they exhibit inference capabilities.

• They have the capability to generalise. That is, if provided with a subset of available samples, the MLP can be guidedduring the training phase to accurately detect patterns, and then to correctly classify them in unseen samples (of the same class pattern) during the testing phase.

• They are robust and fault tolerant systems, and frequently produce reduced error rates when compared to conventional approaches. This property allows them to recall full patterns when presented with incomplete or noisy patterns. The following steps can be undertaken to assist MLP training, and thus improve the MLP capacity for discerning patterns in data (Wong et al., 2005):

1. The use of cross validation during the training process. This has the eect of `smoothing' data to reduce the eect of noise and variability, and prevent

over tting. Cross validation involves selecting a small subset of the available training samples for validation during the training process. That is, as the MLP is training or learning, additional samples are used by the algorithm to reinforce correct learning.

2. Prior statistical analysis of the data. This further enhances the learning capability of the MLP during the training process. Analysis involves the identication of features responsible for data noise. For instance, higher frequencies of values at the extremities of a normal distribution indicate noisy data. Thus, by selecting features less aected by noise (thereby removing those responsible for noise), the MLPs ability to discern patterns in the data is improved. 3. The use of a large number of hidden layer neurons to reduce the eect of bias,

and to prevent under tting. Selection of the number of hidden layer neurons is typically a trial and error process, because there is no denite method or rule for this determination. Kasabov (1996) suggested starting with half the number of input layer neurons, and adjusting that quantity (up and/or down) until a satisfactory result is achieved.

As indicated by point 2 above, pre-processing data (to remove features responsible for noise) may greatly improve the performance of the MLP. The following attributes further highlight the benet of pre-processing data for a pattern recognition system (Bishop, 1995):

• The pre-processing tasks, of identifying appropriate features to represent patterns and the subsequent extraction of those features, result in prior knowledge being obtained from the raw data. Obtaining prior knowledge for any task in- volving supervised learning could be seen as benecial.

• Reducing the number of inputs (by such pre-processing), often leads to improved performance by mitigating (at least to some degree) the curse of di- mensionality.

So as well as being benecial to MLP training by providing prior knowledge, pre-processing also improves eciency in the training process by reducing the input pattern space.

For all of the above reasons, the MLP with back-propagation was selected for use in the current experiment (refer Chapter 5). In the experiment, one MLP was trained to recognise the input pattern of each training group member. As there were three phases of the experiment, there were three types of input patterns (one type for each phase) presented to the MLP for each participant. The three types of patterns corresponded to keystroke dynamics data (refer Chapter 5 section 5.4), ngerprint feature data (refer Chapter 5 section 5.5), and data resulting from the fusion of the other two types (refer Chapter 5 section 5.6).

The next section concludes this chapter by summarising the relevance of this chapter to the overall discussion.

2.5 Conclusion

This chapter provided a discussion on three areas of study associated with the experiment conducted for this dissertation. The material presented provided background to help understand these associated areas, and why certain choices may have been made during the experimental stage of the study.

The rst area of study associated with this dissertation, was that of biometrics (section 2.2). An overview of biometrics was presented in section 2.2.1, which included its denition as the personal characteristics that make individuals unique. Also discussed were reasons why biometrics provide an alternative to traditional authentication procedures, the dierence between physical and behavioural biometric characteristics (their advantages and disadvantages), and the biometric technologies that have evolved in this area.

The components of a biometric authentication system were discussed in section 2.2.2. These include a capture module, a feature extraction module, a matching module, and a decision module. The requirements of a biometric characteristic, for use in a biometric authentication system, were presented as: universality, unique-

ness, permanence, and collectability. The biometric authentication system also needs to consider performance, acceptability, and circumvention. Also, the two phases of biometric authentication system (enrolment and validation) were described.

As with any system, possible errors require identication and performance re- quires measurement. Two system error rates (the failure to capture rate and the failure to enroll rate) were described in section 2.2.3; this section also discussed the performance variables used to present experimental results (the false acceptance rate, the false rejection rate, and the equal error rate).

A description of a number of well known biometric characteristicsfor possible use in a biometric authentication systemwas provided in section 2.2.4. These included facial recognition, iris and retinal pattern recognition, speaker recognition, ngerprint and palmprint recognition, hand geometry, keystroke dynamics, signature recognition, gait recognition, and body odor recognition.

The material presented in section 2.2 was researched in an eort to understand the requirements of a biometric authentication system, and to identify two biometric characteristics that could be utilised for the current investigation. The choice of which two biometric characteristics to utilise was based on the ability to achieve accurate verication, and operational considerations in terms of cost eectiveness and ease of use in a biometric authentication system. A detailed discussion of the two biometric characteristics chosen for the current experiment, and reasons for their selection, is provided in Chapters 3 and 4.

Section 2.3 discussed the areas of data fusion and multi-modal biometrics. An overview of data fusion was presented in section 2.3.1, where the data fusion paradigms (refer section 2.3.1.1), the data fusion levels (refer section 2.3.1.2), and data align- ment (refer section 2.3.1.3) issues were described. Section 2.3 also provided an overview of multi-modal biometrics in section 2.3.2. Data fusion levels (as applied to this area of research) were discussed in section 2.3.2.1, and a review of related literature was presented in section 2.3.2.2.

An overview of pattern recognition and Articial Neural Networks was presented in section 2.4. Section 2.4.1 provided an overview of pattern recognition, and some

classication schemes were described in section 2.4.1.1. Articial Neural Networks were discussed in section 2.4.2. Section 2.4.2.1 explained how ANNs attempt to model the functionality of the biological neurons in the human brain, and section 2.4.2.2 described a number of dierent ANNs architectures and their operations. Section 2.4.3 explained why the Multi-Layer Perceptron is particularly suited to pattern recognition problems, and the reason why this ANN architecture (in prefer- ence to other ANN architectures) was used for the experiment in this study.

In the next chapter an in-depth discussion of the biometric characteristics keystroke dynamics is presented.

Chapter 3 Keystroke Dynamics

3.1 Introduction

This chapter provides a discussion of the biometric characteristic known as keystroke dynamics. The overview of the subject area provides the conceptual basis of keystroke dynamics (refer section 3.2).

Section 3.3 describes the possible metrics that may be used for experimentation in this area of research, including those used in the current study. The calculation of metrics for the current study is described in Chapter 5 section 5.4.3.

Keystroke dynamics metrics exhibit higher degrees of variability than some other biometric characteristics; thus it is not recognised to be as accurate as some of the other characteristics. Section 3.4 provides a review of research eorts in this area to demonstrate that keystroke dynamics can condently be used, provided the appropriate issues are carefully considered.

Finally, section 3.5 summarises a number of the issues associated with the appropriate use of keystroke dynamics, and section 3.6 concludes the chapter.

In document User Authentication Incorporating Feature Level Data Fusion of Multiple Biometric Characteristics (Page 139-145)