• No results found

2.3 ANN Applications on Data Interpretation

2.3.2 Specific Applications of the MLP ANN

Sundgren et. al. (1991) extended the use of pattern recognition techniques to analysing gas sensor signals by implementing ANN models to quantify the individual components within a gas mixture. They compared conventional multivariate analysis techniques based on a partial least squares (PLS) approach with the ANN technique.

Selection of an ANN approach was reasoned as being due to its adaptability to almost any mathematical function. A three layer MLP BP network was selected due to its history of suitability for sensor signal processing. The gas mixtures contained four component gases (hydrogen, ammonia, ethanol and ethylene) with low concentrations. At lower gas concentrations the probability of chemical reactions in the gas phases are reduced, and also Sundgren et al. asserted that it avoided saturation effects of the sensors even though the sensor signals are still nonlinear in the concentration range selected.

The results indicated that both hydrogen and ammonia concentrations were predicted even better with ANN models. However, predictions for ethanol and ethylene concentrations were poor lor both the ANN and PLS models. In a two-component mixture it was determined that hydrogen and acetone concentrations were best predicted from an ANN model.

For the improvement ot the sensitivity and selectivity of the gas sensors in quantifying the gas components the ANN approach was suitable. Three main issues to consider in order to guarantee that the ANN analysis was the better approach to conventional PLS multivariate analysis were (i) the sensor array must be sensitive to the gases of interest, (ii) the distribution of calibration data set is important, and (iii) the design of network and the values of different learning parameters need to be thoroughly investigated.

Similarly, Moore et. al (1993) investigated the ability of ANN's to quantify the concentrations of individual gases and gas mixtures in air from patterns generated by an array of chemically modified sintered tin oxide sensors. There were four gases selected whose detection and control is of import in many manufacturing industries. I hese four gases are hydrogen, methane, carbon monoxide and carbon dioxide. They effectively designed a system that could predict the concentration of gases within a mixture. It was determined that a fully connected MLP network produced very poor performance results mainly due to data overfitting. The final network employed was a

partially connected network with six input units connected to nine hidden units which

reduced overfitting.

There were only three output units (for hydrogen, methane and carbon monoxide; carbon dioxide was excluded). Only three elements each in the hidden layer were

connected to one output element. This was determined to compensate for relatively smaller signals for carbon monoxide compared with hydrogen and methane. It was also thought that this separated the learning characteristics and obviated poor data for one gas affecting another that had better data.

Overall, the hydrogen data gave the best prediction at all concentrations. The methane data was predicted reasonably, however, carbon monoxide concentrations were not predicted well. Therefore, it was surmised that in order to quantify the proportions of gas accurately, the ratio of gases present within a mixture was very important.

Henson et. al. (1992) addressed the feasibility aspects of applying ANN technology to complex data-driven systems. I hey studied the convergence of the BP learning algorithm for the MLP network by applying an optimisation scheme called simulated annealing. Simulated annealing [Levine 1991] is based on the analogy of annealing in solids from the theory of statistical mechanics. The simulation is a stochastic optimisation technique that utilises a descent algorithm modified by random ascent moves to escape local minima. Henson et. a l's simulated annealing enhanced BP algorithm has been implemented in a neural network software package called ANNIE (Artificial Neural Network Integrated Environment).

An empirical evaluation performance of this new algorithm was applied to a multiuser signal detection problem in a spread communication system. The results were compared with extensive practical studies in this pattern classification problem (i.e. multiuser signal detection) in order to validate the ANNIE implementation and evaluate the enhanced learning algorithm. The results demonstrated that it was possible to obtain a better sub-optimal solution than that obtained using standard BP.

A problem that chemists have utilised a MLP network model to solve is the prediction of naturally occurring and man-made elements. The ionization potential (IP) is a property that cannot be measured for short-lived elements. Using multiconfiguration Dirac-Fock (MC'DF) calculations to predict the first few ionization potentials of heavy elements are so computationally intensive as to be impractical for some elements. It was therefore presented by Sigman et. al (1994) that a simple three-layer BP ANN can learn the complex relationship between the electronic structure and the first three ionization potentials of 222 atoms and ions for which spectroscopic data have been

determined. The network predictions were in very good agreement with experimental values not included in the training data set and with values previously calculated by much more sophisticated quantum mechanical methods (like MCDF).

The advantages of utilising ANN outlined in their study include the rapid prediction of a large number of previously unmeasured ionization potentials, better estimation of the error associated with predictions due to larger training set, and ANN's being much less computationally intensive than other techniques. These advantages can outweigh the fact that ANN's do not offer physical insights which a quantum mechanical approach does.

Io produce a classification model that can classify trained patterns to a specified degree ot accuracy. Yeung (1993) employed the MLP network architecture. The objective was to produce constructive ANN's as estimators for Bayesian discriminant functions [Bishop 1995]. The generalisation capability of the trained classification model was measured by the classification performance of the model on a separate testing set and was considered as an inductive inference process. The MLP network used a hyperbolic tangent (tanh) as the transfer function with BP learning.

Thiee issues were addressed, namely (i) slow learning in deep networks; (ii) network size determination; and (iii) learnability. Allowing as few as only one layer of adjustable weights at each learning stage is a simple yet effective technique for speeding up learning in the network. An error-minimization learning algorithm works such that in the class of single-hidden-layer networks the network output values approximate the Bayesian discriminant functions in the minimum mean square-error sense. A Bayesian discriminant function is defined as the ‘a posteriori probability of the event that a pattern in a particular class occurs given the input feature vector’.

The usefulness ot the constructive ANNs for supervised learning were demonstrated with the tour example domains used in the classification experiments (i) mushroom classification; (ii) thyroid disease diagnosis; (iii) waveform recognition and (iv) mirror symmetry detection. The ANN models used are inspired by the cascade-correlation algorithm which uses sigmoidal units for approximating Bayesian classifiers with Bayesian a posteriori probabilities.

I or pattern classification, determining the appropriate network size is of utmost importance; and learning results in the dynamic construction process involving the

adjustment of both network weights and the topology. The addition of new hidden units corresponds to extracting higher-level features from the original input features for reducing the residual classification errors. It was noted that each network approximates a Bayesian classifier that implements the Bayesian decision rule for classification.

Bishop et. al. (1990) have employed MLP ANN's to the task of repetitive nonlinear curve fitting as they provide a fast solution to the problem. ANN's are used to determine the optimal parameter values of the function directly from raw data. The MLP network was used to determine spectral line parameters from measurements of boron (IV) impurity radiation in the COMPASS-C tokamak. A tokamak is the favoured magnetic confinement system for research into producing controlled nuclear fusion.

ANN

iegression lor predicting amino acid levels in six feed ingredients (namely corn: wheat; soybean meal; meat and bone meal; fish meal). Since amino acid determination incurs high costs due to the chemical analysis and laboratory turnover required lor the analysis, they sought to reduce this expense in time and money by

I he complex relationship between ingredient ANN

composition (the inputs) and nutrient level (the outputs) could be more effectively described with the use ol ANN's. 1 he two ANNs used were a three-layer BP network and a general regression network (GRNN).

The GRNN network outperformed the BP ANN and linear regression in predicting ammo acid levels. Roush et. a/.'s methods highlighted that data preprocessing in the lorm of sorting, scaling and normalising raw data would improve the ANN predictability, particularly lor the BP network. It was suggested that customising each individual amino acid in each feed ingredient would maximise the predictive abilities ot the neural network. I his suggestion was successfully supported in this thesis, by the implementation of chemical species ANN models that predicted the spectral line sizes lor individual species accurately. Hence, each trained ANN model produced is customised to individual species to maximise the predictive capability of the network model.

Most researchers have employed the MLP network for classification problems that use discrete data. Those that have exploited the MLP's predictive capabilities still retain a discrete response for the network's predicted output. The work presented in this thesis will emphasise the fact that not only does the trained MLP network have excellent predictive capabilities, but it also uses real continuously-valued multi-input and multi-output data. This makes the extraction procedure presented in this thesis particularly useful lor extracting rules from trained ANN models with real continuously-valued multi-output data units.