Future Work & Research - An Analysis of the Predictive Capability of C5.0 and Chaid Decisio

Future work and research is based on limitations identified as part of the experiment implementation and possible techniques to overcome them. An opportunity to apply the experiment to another research problem is also considered.

Changes to current experiment design

As previously discussed, rerunning the experiment with three target values instead of two may help to improve the false positive issue identified in this experiment. This would involve commencing the experiment from the initial data selection stage and redefining the target values as fatal, serious and slight instead of fatal and no-fatal as in

the current research. In addition consideration of the data groups by a subject matter expert may provide additional insights which could help extract more meaningful contributory factors.

The experiment could also be expanded to consider the results obtained for serious and slight accidents and identify the related key predictors and contributory factors. This experiment extracted data from the STATS19 accident dataset. There are two other STATS19 datasets maintained vehicle and casualty and integrating the three datasets may provide additional insights. Unfortunately due to time constraints and insufficient data knowledge, it was not considered as part of this research experiment.

Support vector machine (SVM)

Research has demonstrated that SVM has been successful in improving the accuracy of cancer classification where clustering was applied before classification (Wahed, et al., 2012). As accuracy was the key limitation of this research experiment evaluation, this technique is of interest. Similarly if clustering was applied to traffic accident data, a rare event like cancer, before applying a classification technique, like SVM, the prediction accuracy may be improved. SVM classification is available in SPSS Modeler.

Consider applying the experiment to Irish road accidents

The experiment was completed based on the UK STATS19 data due to the availability, quality and wide use of the data. However, the experiment could also be applied to the Irish road accidents, although the scope may need to be widened as fatal accident volumes may not be sufficient. Road safety trends are in line with trends in the UK, as outlined in Fig. 6.1, with similar proportion of deaths by road user group.

Figure 6. 1 Trends in Ireland road traffic accident deaths

In order to assess the readiness in Ireland to meet the experiment requirements, a brief questionnaire was prepared and forwarded to a road safety professional in Ireland. The results of the questionnaire are presented in Appendix 1. From the reply it appears that road safety data is consistently recorded and reported and some consideration has already been given to the application of predictive analytics to road safety in Ireland.

6.6 Conclusion

This final chapter considers the experiment completed as part of the research and results achieved. The initial objectives achieved are outlined, together with contributions to the body of knowledge identified during the course of the research. The experiment achievements and limitations are discussed. Future work which could help overcome limitations in this experiment or add to the research learning is considered.

The experiment met many of the initial objectives and although accuracy performance was poorer than expected, fatal traffic accidents prediction was successful. Consideration has been given to further work which could improve the experiment results.

BIBLIOGRAPHY

Abugessaisa, I., 2008. Analytical tools and information-sharing methods supporting

road safety organizations, Linköping: LiU-Tryck.

Aounallah, M., Quirion, S. & Mineau, G. W., 2004. Distributed Data Mining vs. Sampling Techniques: A Comparison. Lecture Notes in Computer Science, Volume 3060, pp. 454 - 460.

Baguley, C., 2001. 'The importance of a road accident data system and its utilization',

International Symposium on Traffic Safety Strengthening and Accident Prevention, 28- 30 November, Nanjing: pp. 1-20.

Berry , M. J. & Linoff, G. S., 2004. Data Mining Techniques: For Marketing, Sales,

and Customer Relationship Management. 2 ed. s.l.:Wiley.

Beshah, T. et al., 2013. Mining pattern from road accident data: Role of road user's behaviour and implications for improving road safety. International journal of

tomography and simulation, 22(1), pp. 73 - 86.

Chapman P., et al, 2000. CRISP-DM 1.0: Step by step data mining guide, s.l.: SPSS. Chong, M., Abraham, A. & Paprzycki, M., 2005. Traffic Accident Analysis Using Machine Learning Paradigms. The International Journal of Computing and

Informatics, Volume 29, pp. 89 - 98.

David, S. & Branche, S., 2004. Road safety is no accident. Journal of Safety

Research,, pp. 173 - 174.

Department of Transport UK, 2013. Reported Road Casualties in Great Britain: Main

Results 2013, s.l.: Department of Transport.

Department of Transport UK, 2013. Reported Road Casualties in Great Britain:

Summary Results 2013, s.l.: Department of Transport.

Eckerson, W., 2007. Predictive Analytics: Extending the Value of Your Data

Warehousing Investment, s.l.: The Data Warehousing Institute.

Frawley, W., Piatetsky‐Shapiro, G. & Matheus, C., 1992. Knowledge Discovery in Databases: An Overview. AI Magazine, Volume 13, pp. 213-228.

Grove, J., 2014. Vehicle Licensing Statistics: 2013, s.l.: Department of Transport UK. Guillet, F. & Hamilton, H., 2007. Quality Measures in Data Mining. Computational

Intelligence and Complexity, Volume 43, p. 120.

Han, J., Kamber, M. & Pei, J., 2011. Data Mining: Concepts and Techniques. Third ed. s.l.:Morgan Kaufmann.

He, H. & Garcia, E., 2009. Learning from Imbalanced Data. IEEE Transactions on

Hermans, E., Brijs, T. & Geert, W., 2009. Elaborating an Index Methodology for

Creating an Overall Road Safety Performance Score for a Set of Countries. Seoul, s.n.

Japkowicz, N., 2000. Learning from Imbalanced Data Sets: A Comparison of Various

Strategies, s.l.: AAAI Press.

Kashani, A., Mohaymany, A. & Ranjbari, A., 2012. Analysis of factors associated with traffic injury severity on rural roads in Iran. Journal of Injury and Violence Research, 4(1), pp. 36-41.

Konstantinos, T. & Chorianopoulos, A., 2009. Data Mining Techniques in CRM:

Inside Customer Segmentation. s.l.:Wiley.

Ling, C. & Li, C., 1998. Data Mining for Direct Marketing: Problems and Solutions. s.l., AAAI Press , pp. 73-79.

Lord, D. & Mannering, F., 2010. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part

A: Policy and Practice, 44(5), pp. 291 - 305.

McCue, . C., 2007. Data Mining and Predictive Analysis Intelligence Gathering and

Crime Analysis. s.l.:Butterworth-Heinemann; 1 edition.

Miner, G., Nisbet , R. & Elder, J., 2009. Handbook of Statistical Analysis and Data

Mining Applications. 1 ed. s.l.:Academic Press.

Mitchell, K., 2002. Collaboration and information sharing: an ROI perspective?. The

Public Manager, pp. 59 - 61.

Monfared, A. B. et al., 2013. Prediction of Fatal Road Traffic Crashes in Iran Using The Box-Jenkins Time Series Model. Journal of Asian Scientific Research, pp. 425- 430.

Nyce, C., 2007. Predictive Analytics White Paper, s.l.: American Institute for Chartered Property Casualty Underwriters/Insurance Institute of America.

Shearer, C., 2000. The CRISP-DM Model: The New Blueprint for Data Mining.

Journal of data warehousing, 5(4), pp. 13-22.

Simoncic, M., 2004. A Bayesian Network Model of Two-Car Accidents. Journal of

Transportation and Statistics, 7(23), pp. 13-25.

Souza, J., Matwin, S. & Japkowicz, N., 2002. Evaluating data mining models: a

pattern language. s.l., s.n.

Tesema, T., Abraham, A. & Grosan, C., 2005. Rule Mining and Classification of Road Traffic Accidents using Adaptive Regression Trees. International Journal of

Simulation: Systems Science & Technology, Volume 6, pp. 80-94.

The International Transport Forum, 2013. Road Safety Annual Report 2013, s.l.: OECD.

The International Transport Forum, 2014. Road Saftey Annual Report 2014, s.l.: OECD.

The Irish Road Safety Authority, 2013. The Irish Road Safety Strategy 2013 - 2020. [Online]

Available at: http://www.rsa.ie/Documents/About%20Us/RSA_STRATEGY_2013- 2020%20.pdf

[Accessed 22 09 2014].

The Police Chief, 2005. Sobriety Checkpoints: An Effective Tool to Reduce DWI Fatalities. The Police Chief Magazine, 72(2).

The SAS Institute, 1998. Data mining and the case for sampling, s.l.: The SAS Institute.

The World Health Organisation, 2013. Global status report on road safety, s.l.: The World Health Organisation.

The World Health Organisation, 2013. Global Status Report on road safety 2013

supporting a decade of action, s.l.: The World Health Organization.

The World Health Organization, 2004. World report on road traffic injury prevention, Geneva: The World Health Organization.

Wahed, E., Emam, I. & Badr, A., 2012. Feature Selection for Cancer Classification: An SVM based Approach. International Journal of Computer Applications, 46(8), pp. 20 - 26.

Wah, Y., Nasaruddin, N., Voon, W. & Lazim, M., 2012. Decision Tree Model for Count Data. Proceedings of the World Congress on Engineering, 4 7.

Weiss, G. & Hirsh, H., 2000. 'Learning to Predict Extremely Rare Events', AAAI

Technical Report WS-00-05, Papers from the AAAI Workshop, Menlo Park: AAAI

Press.

Witten, I. H., Frank, E. & Hall, M. A., 2011. Data Mining: Practical Machine

Learning Tools and Techniques. 3 ed. s.l.:Morgan Kaufmann.

Zhang, S., Zhang, C. & Yang, Q., 2003. Data preparation for data mining. Applied

In document An Analysis of the Predictive Capability of C5.0 and Chaid Decision Trees and Bayes Net in the Classification of fatal Traffic Accidents in the UK (Page 106-112)