5.3 Predictive data mining problem framing
5.3.1 Data mining process
Implementation of any data mining project requires the use of structured procedures and rules, which are commonly known as Cross Industry Standard Process for Data Mining (CRISP–DM). Such logic will guide the analysis in this research but in simplified steps that are suitable to the research scope. The sequence that will be followed is showed in Fig. 5.5 below.
APPLICATION OF PREDICTIVE DATA MINING 128
Fig. 5.5, The proposed predictive data mining functionality for real options analysis
5.3.1.1 Understanding data from real case mining operations
This research will utilise actual industry data from an operating mine in Western Australia. The actual grade control assay data will be used for training and testing the model and the mine plan data will be used to apply the model for real options analysis.
APPLICATION OF PREDICTIVE DATA MINING 129
5.3.1.2 Data preparation and processing
Data processing involves checking for outliers and missing values as well as visually exploring the data
(Fig. 5.6). In the Orange (Demsar et al., 2013) software program, variables were assigned to roles that will
each plan machine learning. The special variable in this analysis was the processing risk, which was set as the ‘Label’ because it is the target.
Fig. 5.6, Visualising outliers using Orange linear projection (Demsar et al., 2013).
This visual inspection indicated that alumina and lump (%) will play an important role when predicting processing risk as they are the best predictors of clay occurrence. Secondly, there were extreme cases of a few blocks with very high alumina percentages and those blocks were already ranked as high risk during block logging and grade control processes.
APPLICATION OF PREDICTIVE DATA MINING 130
5.3.1.3 Data mining model selection
Most of the applicable algorithms were tested in the Orange (Demsar et al., 2013) software program as shown in Fig. 5.7. It was apparent from the test results that the decision tree and random forest classifications produced precise results with better accuracy. To choose between the two, further evaluation was performed in RapidMiner, with the results showing that the decision tree produced better results. Therefore, this was chosen for implementation.
Fig. 5.7, Model selection based on classification accuracy. 5.3.1.4 Implementing thedata mining model
The decision tree classification model was implemented in the RapidMiner (2017) software program. As described previously, the training data was loaded and the processing risk was set as the target variable. Cross validation, a nested activity where an algorithm can be changed, was applied and tested at the same time. Finally, the datamining model was applied to real mine plan data and the output data that contains predictions that are required for real options analysis was exported into a spreadsheet. Additionally, RapidMiner produced a visual tree classification of the data. The resultant tree contained 256 nodes, which could only be shown in a circular format due to its large size (Fig. 5.8). The algorithm confirmed the
APPLICATION OF PREDICTIVE DATA MINING 131
expectation that lump and fine alumina percentages where the priority attributes. The dataset was first split based on lump and fine alumina percentages and it then cascaded down to other elements. For clarity, a truncated version of the tree is shown in Fig. 5.9.
Fig 5.8, Circular view of decision tree classification of clay material.
APPLICATION OF PREDICTIVE DATA MINING 132
5.3.1.5 Evaluating the data mining model
As stated in the preceding section, cross-validation of the model was performed to measure how well the chosen algorithm was performing. During the cross-validation exercise, a neural network algorithm was also evaluated but was found to be less accurate (71%; Fig. 5.10) than the decision tree classification that had an accuracy of 78.6% (Fig. 5.11).
Accuracy 71.43%; Correlation 0.803; Squared correlation 0.644 Fig. 5.10, Neural network analysis of the problematic ore.
APPLICATION OF PREDICTIVE DATA MINING 133
Fig 5.11, Probability density for problematic ore occurrence.
The accuracy of the classification tree model that has been utilised in this research is 78.6%. The results are acceptable as the aim of this research was not to exactly predict the future but to have the indicative direction of what could happen to crusher feed if there is clay material in planned blocks contrary to the resource model prediction. Therefore, the objective of this research has been met by these results as the essence of this data mining was to help in providing a real option for creating flexibility that can give managers the ability to make future decisions. Consequently, it could be statistically inferred in this case that there is a 78.6% probability that the planned crusher feed or tonnes contains problematic ore or clay material, which could result in processing plant downtime. The summary of the model performance is shown by the confusion matrix (Table 5.6). This matrix indicated that 78.6% of the attributes were classified correctly. Moreover, the model classified blocks with a precision of 65.38%. This implied that if a block was identified for instance as medium risk, the model would predict that block as medium risk with 65.38% precision. However, the model has a root mean square error of 0.454 but with strong correlation of 87% which is a good performance as per the purpose of this research.
APPLICATION OF PREDICTIVE DATA MINING 134
Table 5.6, Confusion matrix for problematic ore prediction.