Algorithm 2 Classify Edge
5.1 Future Work
Live Streaming Capability
In order for the system to be more useful to security analysts, the ability for the system to read and classify live information would be a major improvement. In its current iteration, the system reads data files and performs the classification on the data line by line. Being able to stream live information into the system would en-hance the ability of a security analyst to determine the types of attacks present in the network in real time.
In order for live streaming to be a feasible addition, enhancements to the overall
ef-CHAPTER 5. CONCLUSIONS
ficiency of the system are also required. These enhancements are:
Calculation Speedup In the current system, calculations for the posterior probabil-ity are completed in every iteration. These are done to ensure accurate values for the index calculation as used for shuffling. Because of these calculations, as the number of edges becomes large, the speed of the system decreases drastically. When this hap-pens, it becomes unrealistic to expect the system would be able to handle live data as a buildup of data would occur while processing packets. One option to solve this problem is to only calculate the index when necessary. This could mean calculating the index only when a new edge is introduced to determine model generation or on occasion to ensure shuffling is still able to be triggered. While this would increase the overall speed of the system, it would result in a decrease of accuracy because shuffling could not occur immediately after the index falls below the threshold. It would also result in a slowdown of the system when new edges are introduced as the calculations that are currently done in every iteration would be forced to occur all at once whenever the index is recalculated.
Shuffling Speedup In the current system, shuffling is a slow process. For this rea-son, shuffling in a live system would also result in a back log of data while the shuffling occurs. For this reason, in order for live streaming to be possible, the shuffling process requires an increase in speed. One possibility is to determine when the shuffling is considered sufficient to avoid the case where individual edges change models for many iterations.
Removal of Old or Unnecessary Observables In the current system, all observ-ables are held in memory through the entire classification process. This causes the system to use large amounts of memory and prevents the ability for live streaming data from being realistic. One possibility to solve this problem is, once observables have been in the system for a sufficient time, remove these observables. Another option is to determine the point at which individual observables no longer has a large
effect on the features present on an edge. Once this point is reached, the edge can contain the features and the observables present on the edge, or any new observables on the edge, can be discounted. The drawback to this method would be the case in which a large number of observables enter the system on that edge containing a new behavior. If those observables are discounted, the features on the edge are no longer an accurate representation of the edge’s behavior. Another option is to remove the need for observables altogether. If the features were contained in the edge itself instead of in observables, the system would not require the storage of the observables at all.
Split Models in Shuffling
While the ability to shuffle models is an improvement on many streaming data clas-sifiers, the possibility still exists for models to exhibit more than one distinct attack behavior. In order to solve this problem, a method is needed to determine that a single attack model contains multiple attack behaviors. Once this determination has been made, individual models would be created to exhibit these unique attack behaviors and the edges moved into these new models.
Bibliography
[1] Fran¸cois-Xavier Aguessy, Olivier Bettan, Gr´egory Blanc, Vania Conan, and Herv´e Debar. Bayesian Attack Model for Dynamic Risk Assessment. ArXiv e-prints, June 2016.
[2] H. Al-Mohannadi, Q. Mirza, A. Namanya, I. Awan, A. Cullen, and J. Disso.
Cyber-attack modeling analysis techniques: An overview. In 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), pages 69–76, Aug 2016.
[3] Damiano Bolzoni, Sandro Etalle, and Pieter H Hartel. Panacea: Automating attack classification for anomaly-based network intrusion detection systems. In International Workshop on Recent Advances in Intrusion Detection. Springer, Sept. 2009.
[4] CPTC Organizing Committee. Packet captures from 2016 collegiate penetration testing competition at rochester institute of technology. (Data provided by the organization team).
[5] Bernard Desgraupes. Clustering indices. Technical report, University of Paris Ouest-Lab ModalX, 2013.
[6] Haitao Du and Shanchieh Jay Yang. Discovering collaborative cyber attack pat-terns using social network analysis. In Proceedings 4th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction, pages 129–
136, College Park, MD, USA, March 2011. Springer Berlin Heidelberg.
[7] F. Haddadi, S. Khanchi, M. Shetabi, and V. Derhami. Intrusion detection and attack classification using feed-forward neural network. In 2010 Second
Interna-tional Conference on Computer and Network Technology, pages 262–266, April 2010.
[8] Simon Hansman and Ray Hunt. A taxonomy of network and computer attacks.
Computers & Security, 24(1):31–43, 2005.
[9] Prajakta S Kalekar. Time series forecasting using holt-winters exponential smoothing. Technical report, Kanwal Rekhi School of Information Technology, 2004.
[10] Ferenc Kov´acs, Csaba Leg´any, and Attila Babos. Cluster validity measurement techniques. In 6th International symposium of hungarian researchers on compu-tational intelligence, Nov 2005.
[11] Ludmila I Kuncheva. Classifier ensembles for detecting concept change in stream-ing data: Overview and perspectives. In 2nd Workshop SUEMA, pages 5–10, 2008.
[12] Gang Luo, Ya Wen, and Xiang Lingyun. Network attack classification and recog-nition using hmm and improved evidence theory. International Journal of Ad-vanced Computer Science and Applications, 7(4), 2016.
[13] Netresec. PCAPs from 2012 National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition. Available at http://www.netresec.com/?page=MACCDC. Accessed on 2017-04-18.
[14] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. Streaming-data algorithms for high-quality clustering. In Proceedings 18th International Conference on Data Engineering, pages 685–694, Feb 2002.
[15] J. J. Salerno, S. J. Yang, I. Kadar, M. Sudit, G. P. Tadda, and J. Holsopple.
Issues and challenges in higher level fusion: Threat/impact assessment and
in-BIBLIOGRAPHY
tent modeling (a panel summary). In 2010 13th International Conference on Information Fusion, July 2010.
[16] Steven E Strapp. Segmentation and Model Generation for Large-Scale Cyber Attacks. Master’s thesis, Rochester Institute of Technology, Rochester, NY, 2013.
[17] W. Nick Street and YongSeog Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proceedings of the Seventh ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining, KDD ’01, pages 377–382, New York, NY, USA, 2001. ACM.
[18] B. Subba, S. Biswas, and S. Karmakar. A neural network based system for intrusion detection and attack classification. In 2016 Twenty Second National Conference on Communication (NCC), March 2016.
[19] Symantec. Internet security threat report. Available at https://www.symantec.com/security-center/threat-report, 2017. Accessed on 2017-04-25.
[20] Kaijun Wang, Baijie Wang, and Liuqing Peng. CVAP: validation for cluster analyses. Data Science Journal, 8:88–93, 2009.
[21] Shanchieh Jay Yang, Haitao Du, Jared Holsopple, and Moises Sudit. Attack projection. In Alexander Kott, Cliff Wang, and Robert F. Erbacher, editors, Cyber Defense and Situational Awareness, chapter Attack Projection, pages 239–
261. Springer International Publishing, New York City, 2014.