Motivation Analysing Network Traffic with GLCM Parameters Summary and Results
Applying Image Analysis Methods to Network
Traffic Classification
Thorsten Kisner, Alex Essoh and Firoz Kaderali
Department of Communication Systems Faculty of Mathematics and Computer Science
FernUniversität in Hagen, Germany
SPRING 2007
SPRING SIDAR Graduierten-Workshop über Reaktive Sicherheit
Motivation Analysing Network Traffic with GLCM Parameters Summary and Results
Outline
1 Motivation
Texture Analysis Methods Network Traffic
2 Analysing Network Traffic with GLCM Parameters
Determining the GLCM matrix size Evaluation of feature vectors 3 Summary and Results
Accuracy of classification Conclusion and Future Work
Motivation
Analysing Network Traffic with GLCM Parameters Summary and Results
Texture Analysis Methods Network Traffic
Outline
1 Motivation
Texture Analysis Methods Network Traffic
2 Analysing Network Traffic with GLCM Parameters Determining the GLCM matrix size
Evaluation of feature vectors 3 Summary and Results
Accuracy of classification Conclusion and Future Work
Motivation
Analysing Network Traffic with GLCM Parameters Summary and Results
Texture Analysis Methods
Network Traffic
Grey Level Co-occurrence Matrix
Definition
Grey Level Co-occurrence Matrix (GLCM)
C(δ,T) = [s(i,j, δ,T)]for texture analysis [1] [2].
s(i,j, δ,T)is a second order probability going from one grey level i to another grey level j given the displacement vectorδ= (∆x,∆y).
s(i,j, δ,T) = Θ{~x|~x, ~x +δ∈T,g(~x) =i,g(~x +δ) =j}
Θ{~x|~x, ~x+δ ∈T} (1)
Motivation
Analysing Network Traffic with GLCM Parameters Summary and Results
Texture Analysis Methods
Network Traffic
Grey Level Co-occurrence Matrix
Parameters describing a texture
Angular Second Moment=X
i X j (s(i,j))2 (2) Entropy=−X i X j s(i,j)·log(s(i,j)) (3)
Inverse Difference Moment=X
i X j s(i,j) 1+ (i−j)2 (4) Inertia=X i X j (i−j)2·s(i,j) (5)
(2) describes the energy of the matrix, (3) the information content. (5) can be interpreted as the contrast and (4) as an inverse weighted measure of contrast.
Motivation
Analysing Network Traffic with GLCM Parameters Summary and Results
Texture Analysis Methods
Network Traffic
Network Traffic
In- and outgoing traffic, two types: SMTP and HTTP Measured at the gateway to the external network with the
built-in packet and byte counter ofiptables(1 second
resolution in time).
70 independent tracesof 9 hours (weekdays between 7:30am and 4:30pm) for each type of traffic.
60 traces for training data 10 traces for verification
Like thewindowing mechanism(see T in eq. (1)) in the
texture analysis we divide each 9 hour time series in 6 segments of 90 minutes
Motivation
Analysing Network Traffic with GLCM Parameters
Summary and Results
Determining the GLCM matrix size Evaluation of feature vectors
Outline
1 Motivation
Texture Analysis Methods Network Traffic
2 Analysing Network Traffic with GLCM Parameters Determining the GLCM matrix size
Evaluation of feature vectors 3 Summary and Results
Accuracy of classification Conclusion and Future Work
Motivation
Analysing Network Traffic with GLCM Parameters
Summary and Results
Determining the GLCM matrix size
Evaluation of feature vectors
Determining the GLCM matrix size
In texture analysis the size of the co-occurrence matrix is explicitly given by the range of the greyscale values In our scenario the source for the co-occurrence is a time series withno explicitly given limitfor the values
Huge matrix size to the magnitude of 107x107doesn’t
make sense thus requiringquantisation. We analysed a
linear quantisation to a matrix size of 2iwith
Motivation
Analysing Network Traffic with GLCM Parameters
Summary and Results
Determining the GLCM matrix size
Evaluation of feature vectors
Determining the GLCM matrix size
2 4 6 8 10 12 −5 0 5 10 15 20 25 30 35 40 45 Size of GLCM log 2 log 2 Linearly Dependent Inertia Cluster Shade Cluster Prominence 2 4 6 8 10 12 0 0.5 1 1.5 2 2.5 Size of GLCM log 2 Not Dependent
Inverse Difference Moment Correlation Angular Second Moment Entropy
Figure:Parameters as a function of matrix size
Motivation
Analysing Network Traffic with GLCM Parameters Summary and Results
Texture Analysis Methods
Network Traffic
Network Traffic
Example 0 50 100 150 200 250 300 350 400 450 500 550 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 106 Bytes / T i time interval T iMotivation
Analysing Network Traffic with GLCM Parameters
Summary and Results
Determining the GLCM matrix size
Evaluation of feature vectors
Evaluation of feature vectors
0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 ASM 0 0.5 1 1.5 2 0 20 40 60 80 100 ENT −1 0 1 2 3 x 10−3 0 50 100 150 200 CORR 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 IDM 0 500 1000 1500 2000 2500 0 50 100 150 INE 0 0.5 1 1.5 2 2.5 3 x 108 0 200 400 600 CP SMTP−Traffic HTTP−Traffic
Figure: Histograms of selected GLCM parameters
Motivation
Analysing Network Traffic with GLCM Parameters
Summary and Results
Determining the GLCM matrix size
Evaluation of feature vectors
Inverse Difference
Moment (IDM) and
Correlation (CORR)
plotted against each
other.
Intersection of both
classes, but clustering can be observed. −0.5 0 0.5 1 1.5 2 2.5 3 x 10−3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CORR IDM SMTP−Traffic HTTP−Traffic
Motivation Analysing Network Traffic with GLCM Parameters
Summary and Results
Accuracy of classification Conclusion and Future Work
Outline
1 Motivation
Texture Analysis Methods Network Traffic
2 Analysing Network Traffic with GLCM Parameters Determining the GLCM matrix size
Evaluation of feature vectors
3 Summary and Results Accuracy of classification Conclusion and Future Work
Motivation Analysing Network Traffic with GLCM Parameters
Summary and Results
Accuracy of classification
Conclusion and Future Work
Accuracy of classification
k-Nearest-Neighbor(kNN) algorithmwith k =5 to classify
the 120 segments1of unknown traffic to the classes SMTP
or HTTP
Only use of thefour most relevant parameters(Angular
Second Moment (2), Entropy (3), Inverse Difference Moment (4) and Inertia (5)).
Traffic Positive Negative Classification rate
HTTP 55 5 91.67%
SMTP 52 8 86.67%
Total 107 13 89.17%
Motivation Analysing Network Traffic with GLCM Parameters
Summary and Results
Accuracy of classification
Conclusion and Future Work
Conclusion
Novel approach for identifyingnetwork trafficby mapping
given time series to the known co-occurrence matrix of the
domain of texture analysis.
Using texture analysis methods we classified even
inaccurate and aggregrated datawith an accuracy of 90%.
Motivation Analysing Network Traffic with GLCM Parameters
Summary and Results
Accuracy of classification
Conclusion and Future Work
Future Work
Analysation ofmulti-dimensional time series.
Examination of network traffic with the proposed method
on packet level also includingnetwork flow information.
Implementing avisualisation frameworkbased on Grey
Appendix For Further Reading
End
For Further Reading
R. M. Haralick, K. Shanmugam and I. Dinstein, Textural features for image classification, IEEE Transactions on
Systems, Man, and Cybernetics, 3(6), November 1973,
610-621
R.W. Conners, M. M. Trivedi, C.A. Harlow, Segmentation of a High-Resolution Urban Scene using Texture Operators,
Computer Vision, Graphics and Image Processing, 25,
1984, 273-310
Appendix For Further Reading End