A Double-Filter Structure Based Scheme for Scalable Port Scan Detection

(1)

A Double-Filter Structure Based Scheme for

Scalable Port Scan Detection

Shijin Kong

1∗

_{, Tao He}

2†

_{, Xiaoxin Shao}

3∗

_{, Changqing An}

4†

_{and Xing Li}

5† ∗_{Department of Electronic Engineering, Tsinghua University, Beijing, P.R.China 100084}

Email:{ksj001_{, sxx03}3_{}@mails.tsinghua.edu.cn}

†_{China Education and Research Network, Beijing, P.R.China 100084} Email:{hetao2_{, xing}5_{}@cernet.edu.cn, acq}4_{@tsinghua.edu.cn}

Abstract— Port scan detection is very important to predict network intrusions and prevent viruses from spreading. Many networks deploy Network Intrusion Detection Systems (NIDS) to detect port scans in real-time. However, most NIDS are per-flow based. They are not scalable on high speed links since it is infeasible to maintain the states of numerous flows. In this paper, we propose a scalable scheme for real-time port scan detection without keeping any per-flow state. We use a double-filter structure to find out <SIP1_{, SP> pairs which connect to}

more than N <DIP, DP> pairs in T time. The experimental results on real network traces show that our scheme can find out those over-threshold <SIP, SP> pairs with high accuracy. It is easy to scale our scheme to high speed environments due to its little memory consumption and fast processing pipeline.

I. INTRODUCTION

Port scan detection is very important for security man-agement. Many attackers perform port scans as a beginning to find out vulnerable hosts to compromise. Detecting such port scans indicates incoming network intrusions. Besides, recent worm epidemics, such as Code Red-II, Nimda, etc. scan other vulnerable hosts for propagation [10], [11]. Network supervisors can prevent viruses from spreading by detecting those port scans and then prohibiting them.

A port scan is typically initiated by sending some packets from a same source and a same port to various destinations and ports. If any destination has a service listening on the scanned port, the connection is established and a reply is sent back. From the reply, the attacker (or the worm) can know whether a service is available on the scanned port. It will try to exploit security problems of the service for further intrusion. There are two access patterns of port scans, horizontal (multiple destinations, same port) and vertical (same destination, multiple ports). To detect port scans early and prevent their further damage, many networks employ Network Intrusion Detection Systems (NIDS) at network entrances.

With the rapid development of internet, the scalability of NIDS becomes a main problem of port scan detection. Most NIDS are per-flow based, which means they maintain the state of each flow during detection. Here, a flow is a communication process between two peers, e.g. a TCP connection or a HTTP session. Maintaining per-flow states is necessary for NIDS to detect port scans accurately. For example to identify a 1_{Abbreviations are used for frequently referred terms. SIP=Source IP,} DIP=Destination IP, SP=Source Port, DP=Destination Port.

horizontal port scan, each <SIP, SP> pair should maintain the number of all destinations to which it has connected. Therefore, packets with different <SIP, SP, DIP> values are classified to flows on arrivals and the number of flows for each <SIP, SP> pair is counted. Although per-flow based NIDS work well in the past, this is not the case nowadays. The links connected to most network entrances are now upgraded to gigabytes or even higher capacity. The number of flows is quite large at such a high speed entrance. There is not enough time to distinguish numerous flows and space to store information for each of them. Since almost all current NIDS depend more or less on per-flow states, it is very hard to adapt them to high speed environments. A temporary solution is to split the whole traffic into several substreams, each of which is monitored by a NIDS. However, the installation is expensive and the management is complicated. Devising scalable port scan detection schemes becomes necessary and urgent.

In this paper, we propose a scalable scheme for real-time port scan detection. Both horizontal and vertical port scans can be detected. We use a double-filter structure to find out <SIP, SP> pairs which connect to more than N <DIP, DP> pairs in T time. Those <SIP, SP> pairs are probably sources of port scans and they can be further inspected later. No per-flow state is maintained or updated in the process of looking for such over-threshold <SIP, SP> pairs. Only the double-filter structure is kept in a small amount of memory and the per-packet processing pipeline is simple. The experimental results on real network traces show that our scheme, though does not maintain any per-flow state, is accurate to find out over-threshold <SIP, SP> pairs. Majority of over-over-threshold <SIP, SP> pairs are detected and very few benign <SIP, SP> pairs are mistaken as over-threshold.

We believe our scheme is among the few scalable ones devised after PCF [2], which initiates the study of scalable attack detection schemes. Two main drawbacks of applying PCF for port scan detection are solved in our scheme. The rest of this paper is organized as follows. Section II reviews related work. Section III proposes our scheme and section IV gives a deep analysis on the double-filter structure. For practical implementation, a series of problems are raised and solved in section V. In section VI, we draw a comparison between our scheme and other NIDS. Experimental results on traces are presented in section VII. Finally, section VIII concludes the

(2)

whole paper.

II. RELATEDWORK

As noted in [7], little work has been done to detect port scans. Most port scan detection schemes are based on counting more than N events of interest during a given period of time T . The first approach applying this policy is NSM [6], which detects any source connecting to more than 15 destinations within a given time window. Snort [4], a famous open source NIDS, keeps a 65536-bits long vector for each source to record all the ports to which the source has connected. Bro [5], a NIDS using failed connections as indicators of port scans, also maintains all the destinations for each source.

Other approaches depend on statistical models. In [7], the joint probability P (d, p) is kept for each combination of destination d and port p. Any connection whose P (d, p) is less than a given threshold is considered a part of port scans. Another probabilistic approach [8] needs to keep states for each connection (s, d, p) where s is source, d is destination and p is port. A recent research uses threshold based random walks to detect fast port scans [9].

All those schemes mentioned above have to know more or less about per-flow states. To be scalable, Partial Completion Filter(PCF) is devised in [2] to count the number of SYN flag for each source. Any source with numerous SYN but no FIN is considered as a scanner. The state of each source can be approximately told by checking the content of PCF so that no per-flow state is explicitly stored. Since PCF is similar to our scheme, we draw an analytical comparison between PCF and our scheme in section VI.

III. OVERVIEW OFDOUBLE-FILTERSTRUCTUREBASED SCHEME

In this section, our scalable scheme for real-time port scan detection is introduced. As the policy used in Snort [4] and Bro [5], it is also based on detecting N events in T time. T is called a measurement interval. A double-filter structure is used to find out all <SIP, SP> pairs which connect to more than N <DIP, DP> pairs during a measurement interval. At the end of every measurement event, all those over-threshold <SIP, SP> pairs are reported to supervisors for further inspection.

Several terms should be defined before we continue this section. A flow is defined as a set of packets with a same flow key which consists of certain fields in packet header. In this paper, a flow key is always considered as the combination of four tuples: <SIP, SP, DIP, DP>. A flow is terminated if the time since the arrival of its latest packet exceeds a time-out threshold T0. Flow length is defined as the number of packets in a flow.

Our scheme keeps a data structure which contains two filters in memory during detection. Every incoming packet attempts to pass those two filters serially. The per-packet processing pipeline is described as follows. The first filter is a Time-out Bloom Filter, which is derived from Bloom Filter [3]. TBF is a hash table with m buckets, each of which contains a timestamp. The m buckets are denoted as a[0], a[1], . . . ,

a[m − 1] and the corresponding timestamps are t[0], t[1], . . . , t[m − 1] respectively. There are d independent hash functions, h1(x), h2(x), . . . , hd(x), attached to TBF. Each hash function

maps a given flow key into one of the m buckets with same probability. Besides, each bucket has a time-out value t0. That is where the name “Time-out” comes from.

When a new packet with key c comes at time t, the d timestamps stored in t[h1(c)], t[h2(c)], . . . , t[hd(c)], are

com-pared with t. If any of the d timestamps, the ith for example (1 ≤ i ≤ d), follows t − t[hi(c)] ≥ t0(or we say a[hi(c)] gets

time-out), the packet passes TBF, or otherwise it fails to pass. We call a passed packet a ”survivor”. After comparison, all those d timestamps are updated to t even if the packet fails to become a survivor.

In fact, with an optimal set of (m, d, t0), TBF has two special functions (analyzed in section IV and section V). (1) No flow can own two or more survivors. (2) Any flow has a probability ps to own one survivor. Here, ps is a value

determined by (m, d, t0). Hence, a survivor can be viewed as a representation of the corresponding flow. If the <SIP, SP> value of a survivor is equivalent to e, we say this survivor belongs to <SIP, SP> pair e. Finding out <SIP, SP> pairs which connect to more than N <DIP, DP> pairs equals to finding out <SIP, SP> pairs which create more than N flows. And the latter one can be achieved by recording <SIP, SP> pairs which have more than M = psN survivors. That is

exactly what the second filter does.

Only survivors arrive at the second filter, a Multistage Filter [1]. MF has s stages, each of which has n buckets, denoted as bi[0], bi[1], . . . , bi[n − 1](1 ≤ i ≤ s), and a hash function

gi(x). Each bucket of MF is actually a counter. When a

survivor attempts to pass MF, s buckets are selected from s stages based on the <SIP, SP> value e of the survivor: b1[g1(e)], b2[g2(e)], . . . , bs[gs(e)]. Then, each of them is

in-creased by one. If all s buckets are over M , the survivor passes MF and e is recorded as a suspicious scanner. To give a clearer view, the whole pipeline is illustrated in figure 1.

Both TBF and MF are empty at the beginning time Ts.

All the buckets of TBF are set to Ts− t0 and those of MF

are cleared to zero. TBF is never cleared again since Ts. It

keeps producing survivors all the time. On the contrary, MF is frequently reset to zero at the end of every measurement interval and a list of suspicious scanners is sent to supervisors. Typically, behaviors of suspicious <SIP, SP> pairs are further evaluated in the next several measurement intervals.

IV. ANALYSIS OFDOUBLE-FILTERSTRUCTURE As we mentioned, in an optimal TBF, each flow can have exact one survivor from its packets with probability ps. How

can TBF do this and what is the value of ps? To answer this

question preliminarily, we make a theoretical analysis to give a deep view inside TBF. And we illustrate the function of MF later in this section.

A. Time-out Bloom Filter

All packets of a flow F , denoted as P1, P2, . . . , Pr in

(3)

inter-h₃(c) h₂(c) h₁(c) Anyone time-out with t0? TBF A Packet with flow key c and <SIP, SP> value e Yes: Survivor MF g₁(e) g2(e) g₃(e)

,QWHUYDO ,QWHUYDO ,QWHUYDO ,QWHUYDO

7LPH$[LV &OHDU 7%) 0)

&OHDU0) &OHDU0) &OHDU0) &OHDU0)

Fig. 1. Per-packet pipeline of double-filter structure based scheme

packet interval of Pi(2 ≤ i ≤ r) is the interval between the

arrival time of Pi−1 and that of Pi. Obviously, P1 does not have an inter-packet interval. We call P1 a “first packet”, and Pi(2 ≤ i ≤ r) a “rest packet”.

Lemma 1: At any time, the probability that any bucket of TBF gets time-out is p0= (1−1/m)Ld, where L is the number of flows which have packets updated during the previous t0 time.

Proof: Only the buckets that are updated by flows during the previous t0time do not get time-out. If the buckets getting time-out are viewed as being set to “1” and others are viewed as being set to “0”, TBF degenerates to a standard Bloom Filter containing L elements. From [3], we know that the probability a bucket is “0” is p0= (1 − 1/m)Ld.

In the rest discussion of this section, we assume temporarily that L does not vary much during a measurement interval. Therefore, p0 is considered constant within a measurement interval.

Lemma 2: For any flow F , the probability that P1becomes a survivor is ps= 1 − (1 − p0)d.

Proof: The probability that any of the d buckets gets time-out is p0. So the probability that all d buckets do not get time-out is (1−p0)d. That is the case P1fails to be a survivor. So the probability that P1 becomes a survivor is ps= 1 − (1 − p0)d. Lemma 3: For any flow F , the probability that Pi(2 ≤ i ≤

r) becomes a survivor is: (1) ps, if the inter-packet of Pi is

greater than t0; (2) 0, otherwise.

Proof: If the inter-packet interval of Pi(2 ≤ i ≤ r) is

smaller than t0, the d buckets are updated by Pi−1 within

the previous t0 time when Pi comes. All the buckets do not

get time-out, so Pi will not be a survivor. If the inter-packet

interval of Pi is greater than t0, the analysis is the same as P1 in lemma 2.

Theorem 1: If t0 = T0, all the survivors are first packets. The number of survivors in a measurement interval T is psK,

where K is the number of flows during T .

Proof: For any flow F , inter-packet interval of any rest packet is smaller than the flow time-out value T0. So if t0= T0, no rest packet will become survivors. As proved in

lemma 2, the probability that a first packet becomes a survivor is ps. Thus, totally psK survivors are generated from K first

packets. Each survivor represents a disparate flow. B. Multistage Filter

Multistage Filter is first devised in [1] to detect heavy hitters which have more than C%(1 ≤ C ≤ 100) of the total traffic. In our scheme, it is used to detect <SIP, SP> pairs which have more than M survivors. Any <SIP, SP> pair e, which has more than M survivors, will be definitely recorded. This is because after the last survivor of e has attempted to pass MF, all the s counters: b1[g1(e)], b2[g2(e)], . . . , bs[gs(e)] go

over M . On the other hand, any <SIP, SP> pair e0 _{which has}

fewer than M survivors may also be recorded. This happens when all the s counters: b1[g1(e0)], b2[g2(e0)], . . . , bs[gs(e0)] go

over M with the help of other <SIP, SP> pairs. It is called a false positive error. In [1], detailed analysis has been drawn to decrease the occurrence of false positive errors to a very low extent. We will not discuss how to modulate parameters to reduce false positive errors any more in this paper. We just follow the indications in [1] to set (n, s) for MF.

Theorem 2: If t0= T0, a <SIP, SP> pair e which connects to more than N <DIP, DP> pairs will be recorded if M = psN .

Proof: e creates more than N flows. From theorem 1, we know that more than psN survivors will be generated by TBF.

If M = psN , e is definitely recorded.

V. PRACTICALPROBLEMS, EXPLANATION AND SOLUTIONS

In this section, we continue to analyze TBF and tune its parameters for practical considerations. There are several real-istic problems which make some of the theoretical assumptions infeasible. However, we develop corresponding solutions to solve those inconsistencies between theory and practice.

Problem 1: Why should we use TBF but not standard Bloom Filter? BF stores m “0”/“1” bits instead of timestamps in m buckets, which consumes even smaller memory. It can also be used as the first filter to select first packets of flows. The pipeline are summarized as follows.

(4)

(i) At the beginning of every measurement interval, all m bits of BF are set to “0”.

(ii) When a new packet with key c comes , d bits: a[h1(c)], a[h2(c)], . . . , a[hd(c)], are checked. If any of them is “0”, the

packet must be a first packet and it passes BF. Otherwise it fails to pass. After checking, all d bits are set to “1”.

Explanation: In practice, there are several drawbacks using BF to select first packets.

(1) All the m bits should be reset to zero at the beginning of every measurement interval. m is usually at a level of 100,000. The time consumed for resetting such a great number of bits is not neglectable. It brings extra burden for detection.

(2) BF is gradually filled with “1” by first packets, so p0 is not constant during the measurement interval. Accordingly, the probability that a first packet becomes a survivor is not the same at different time. Therefore, each <SIP, SP> pair has different proportion of survivors in its first packets. Setting a single M for MF will result in many false positive errors and missed scanners.

(3) A flow which spans measurement intervals will be detected as two flows in two measurement intervals. A server may create N long lived flows only within one measurement interval. It should not be detected in subsequent measurement intervals since it does not create any more flow. However, it does have multiple survivors in subsequent measurement intervals, which is quite unreasonable.

All those drawbacks are overcomed by using TBF. For drawback (1): TBF is only cleared once at Ts. All

buckets automatically transform “1” (not time-out) to “0” (time-out) as time elapsing. Explicit resetting is not necessary. For drawback (2): p0 is constant and the probability that a first packet becomes a survivor is almost the same at any time (around ps), as referred in section IV.

For drawback (3): When t0 = T0, a flow has exactly one survivor. Even if a flow spans measurement intervals, it has one survivor in current measurement interval and does not have any in subsequent measurement intervals. Within a measurement interval, only <SIP, SP> pairs with more than N new created flows are detected.

Problem 2: In practice, a <SIP, SP> pair connecting to fewer than N <DIP, DP> pairs may have more than psN

survivors. it will definitely pass MF and cause a false positive error. Similarly, a <SIP, SP> pair with more than N <DIP, DP> pairs occasionally have fewer than psN survivors. it may

fail to pass MF and becomes a missed scanner.

Solution: To our experience (from the results in section VII), those false positive errors and missed scanners only hold a very small percentage of total recorded scanners. To further reduce false positive errors, we can detect <SIP, SP> pairs which are over-threshold in several consecutive measurement intervals. Even if a benign <SIP, SP> pair is mistaken as a scanner within a measurement interval, it is less possible that this pair is recorded in several consecutive measurement intervals.

Problem 3: In practice, L is not the same in all measure-ment intervals. L directly determines ps, so M = psN should

be calculated and reset for MF in every measurement interval. Here, L is still assumed constant within a measurement interval. A simple solution is to count the flows in the first t0 time of every measurement interval and take it as L. However, it requires additional flow identification algorithm which may not be scalable and it consumes lots of time.

Solution: Although L is not constant among all mea-surement intervals, the difference of L in two consecutive measurement intervals is little. We can test L in previous measurement interval and use it to calculate M for the next measurement interval. Moreover, we measure p0 instead of L to avoid using extra flow identification algorithm.

An alternative solution is described as follows, based on lemma 1. We choose several random buckets periodically to see whether they get time-out or not. We suppose such random tests are performed R times within a measurement interval, and U consecutive buckets are randomly selected at each time. At the end of a measurement interval, if V of the U · R test results are ”time-out”, V /U R is an unbiased estimation of p02_{. We use V /U R as p}

0for next measurement interval. Take ENTRA-1 (a trace tested in section VII) for an example, U = 100 consecutive buckets are randomly chosen and tested at the beginning of every second.

Problem 4: By using the solution in problem 3, L of the next measurement interval can be estimated from (1 − 1/m)L_{d = V /U R. Actually, this L is the average value within}

a measurement interval. Till now, we have assumed that L is constant within a measurement interval. However, in practice there are times that L changes greatly within a measurement interval. The fluctuation of L probably causes either too many false positive errors or missed scanners.

Solution: We look for proper m and d to make psinsensitive

to L. An insensitive ps is approximately constant even if L

varies much. The sensitive extent is evaluated by |∂ps/∂L|. If

|∂ps/∂L| is smaller, psis less sensitive to L.

|∂ps ∂L| = − d L(1 − (1 − 1 m) Ld₎d−1_{(1 −} 1 m) Ld_{ln(1 −} 1 m) Ld = 1 L· d(1 − p0) d−1_{· (−p0}_{ln p} 0) (1)

The former part:d(1 − p0)d−1increases monotonically with the increase of p0. And the derivative to d of the former part is.

∂(d(1 − p0)d−1₎

∂d = (1 − p0)

d−1_{(1 + d ln(1 − p}

0)) (2) According to equation (2), if d is much greater than −1/ ln(1 − p0), the derivative is far less than zero, and thus results in smaller d(1 − p0)d−1.

The derivative to p0 of latter part:−p0ln p0 is. ∂(−p0ln p0)

∂p0 = − ln p0− 1 (3)

2_{We suppose a random variable x is measured as y. If E[x] = E[y], y is} an unbiased estimation of x.

(5)

Similarly, according to equation (3), when p0 is far greater than 0.37 (− ln 0.37 = 1), −p0ln p0 is much smaller.

In practice, setting d to 3 or 4 is enough. When d is fixed, m should be set as great as possible to form a great p0.

VI. COMPARISON WITHOTHERNIDS A. With Per-flow Based NIDS

We compare our scheme with per-flow based NIDS in the following three aspects.

1) Memory Consumption: As noted in [12], current high speed implementations of other network tasks tend to use small memory footprints into on-chip fast SRAM, which is usually no greater than 1MB. For those per-flow based NIDS, it is impossible to store states of millions of connections in limited SRAM. They usually keep per-flow states in slow DRAM.

On the contrary, our double-filter structure can be kept within less than 1MB SRAM. For example, to detect the link from which ENTRA-1 is recorded, m = 262, 144 buckets are allocated for TBF. Each bucket only needs one byte to store the last eight bits of timestamps in seconds for time-out judging (for details, see appendix). For MF, n = 16, 384, s = 3, and we set two bytes for each bucket. The total memory usage is only 262, 144×1+16, 384×3×2 = 352KB. The rest of SRAM can be used to store information of suspicious scanners.

2) Processing Speed: Per-flow based NIDS have to execute time-consumed flow identification algorithms to find the flow for updating at every packet’s arrival. Then, the flow state in DRAM is updated, which is also quite slow.

By comparison, our scheme only calculates d hash values in fast SRAM for every packet, and additional s hash values for a very small portion of survivors. All the hash functions used in two filters are based on combinations of several AND or OR operations. Those operations consume little processing time and gives out perfect random values ??.

3) Detection Accuracy: Detection accuracy is represented by the number of real scanners detected and that of false positive errors. NIDS using per-flow states to detect N events in T time can exactly capture all over-threshold <SIP, SP> pairs without exception. Although our scheme misses some suspicious scanners and creates some false positive errors in detection, the accuracy still remains high (see results in sec-tion VII). It is hard to compare our scheme with probabilistic approaches such as [8] and [9], since there isn’t a uniform criterion.

B. With Scalable Scheme: PCF

We have introduced PCF in section II. It uses the same type of hash functions as ours and can also be placed in small amount of SRAM. However, since PCF is based on counting SYN/FIN flags, it has two major drawbacks.

(1) PCF can not detect UDP scans. UDP connections do not have explicit flags to indicate a connection, such as SYN/FIN. (2) PCF depends on monitoring both directions of traffic to be correct. In the case where only one direction of traffic is available, the scanning behaviors can be spoofed and mistaken as benign. A scanner can send FIN packets before he scans

any destination using SYN packets. The number of SYN and FIN are almost the same so that the scan can not be detected.

Both the two drawbacks are solved in our scheme. For drawback (1): We depend on connection patterns for detection. Port scans using any protocol can be detected.

For drawback (2): Even if one direction of traffic is avail-able, a flow is definitely created as long as a destination is scanned. Any spoofing behavior can not eliminate the existence of flows.

VII. EXPERIMENTALRESULTS ONREALTRACES A. Traces Description

We test the performance of our scheme in this section by evaluating experimental results on real network traces. The purpose is to detect <SIP, SP> pairs which connect to more than N = 60 <DIP, DP> pairs within every measurement interval T = 1 minute. First, we make a description of traces. Both traces are unidirectional, captured from two entrances of THUNET (TsingHua University NETwork). ENTRA-1 is captured from an entrance connected with a gigabytes link and ENTRA-2 is captured from another entrance with an OC48 link. Both traces have a great diversity of flows and <SIP, SP> pairs. The time-out value of flow is T0 = 30s. Table I gives detailed information of two traces.

B. Evaluating TBF

The evaluation of TBF focuses on its accuracy. It is repre-sented by the number of survivors generated for each <SIP, SP> pair. The estimated number of survivors of a <SIP, SP> pair with N <DIP, DP> pairs is calculated as the closest integer to psN . t0 is set to 30s (equivalent to T0). m, d are chosen according to the solution of problem 4 in section V. We set m = 218 _{= 262, 144, d = 3 for 1. For} ENTRA-2, we enlarge m to 219 _{= 524, 288. Other parameters are} the same as ENTRA-1. Thus, the typical p0 is about 0.58 for both traces. Due to lack of space, Figure 2 only shows actual number of survivors A, against the estimated number of survivors B during the first measurement interval of ENTRA-1. Every point in the figure represents a <SIP, SP> pair. Most points are very close to the line with slope 1. We calculate the average relative error E[|A−B|×100%/A] on all <SIP, SP> pairs. The result is merely 2.5%.

C. Evaluating Our Scheme

Table II shows the detection results during every measure-ment interval for both traces. n = 16, 384 and s = 3 are set for MF. M is recalculated at the beginning of every measurement interval according to the solution of problem 3 in section V. The row marked “Detected” is the number of <SIP, SP> pairs detected by our scheme. And the row marked “Actual” is the real number of over-threshold <SIP, SP> pairs. The number of false positive errors and missed scanners are placed in row “False Pos” and row “Missed” respectively. The results are satisfying. On average, more than 95% over-threshold <SIP, SP> pairs are detected, and the number of false positive errors are no more than 2.5% of the total detected <SIP, SP> pairs.

(6)

TABLE I

DETAILEDDESCRIPTION OFTRACES

Number of Packets Duration Number of Flows Number of <SIP,SP> pairs Typical Value of L (t0=30s)

ENTRA-1 44M 10min 712,315 332,226 about 48,000

ENTRA-2 79M 10min 1,161,801 664,249 about 96,000

TABLE II

DETECTION RESULTS OF DOUBLE-FILTER STRUCTURED SCHEME

Detection Results in 10 measurement intervals (T = 60, t0= T0= 30s)

1T 2T 3T 4T 5T 6T 7T 8T 9T 10T Total Percentage ENTRA-1 Detected 80 68 75 84 84 91 84 79 88 80 813 − False Pos 0 0 1 1 2 9 1 3 3 1 21 2.6% Missed 3 2 1 12 4 2 3 5 2 5 39 4.8% Actual 83 70 75 95 86 82 86 81 87 84 829 − ENTRA-2 Detected 216 139 142 167 160 142 152 155 140 142 1555 − False Pos 1 0 0 5 5 6 4 7 1 2 31 2.0% Missed 17 12 7 5 2 10 6 5 3 2 69 4.4% Actual 232 151 149 167 157 146 154 153 142 140 1591 − 1 10 100 1000 1 10 100 1000 A c tu a l N u m b e r o f S u rv iv o rs

Estimated Number of Survivors (p_sN)

Fig. 2. Actual number of survivors vs. estimated number of survivors for ENTRA-1

VIII. CONCLUSION

In this paper, we devise a double-filter structure based scheme for scalable port scan detection in real-time. it detects port scans without keeping any per-flow state. The detection accuracy is satisfying, with very small percentage of false positive errors and missed scanners. The scheme consumes far less memory and processing time than per-flow based NIDS, which makes it much more scalable in high speed network environments.

APPENDIX

One byte timestamp for correct time-out judging: A bucket can represent up to 28_{= 256 seconds in one byte. We suppose} a bucket is recently updated at t1 (in seconds). The last eight bits of t1 are denoted as t01. Now at time t2(t2 > t1), we

are going to judge whether the bucket gets time-out or not. The last eight bits of t2 are denoted as t02. If the bucket does not get time-out (t2− t1 < t0), either t02− t01 < t0 (when t0

2 > t01) or t02 − t10 + 256 < t0 (when t02 < t01) is valid. However, theoretically speaking, we can not judge correctly from the values of t0

2 − t01 and t02 − t01+ 256. As long as t2 − t1− 256n < t0 (n is an integer and n ≥ 0), either t0

2− t01 < t0 or t02− t01+ 256 < t0 is valid. In the case n > 0, the bucket actually gets time-out. Fortunately, in practice any bucket is updated within 256 seconds, so there won’t be n > 0. Therefore, any bucket which has t0

2− t01 < t0 or t0

2− t01+ 256 < t0 can be judged correctly as time-out. REFERENCES

[1] C. Estan, G. W. Daly. New Directions in Traffic Measurement and Accounting. ACM SIGCOMM, 2002.

[2] R. R. Kompella, S. Singh, and G. Varghese. On Scalable Attack Detection in the Network. ACM SIGCOMM IMC, 2004.

[3] B. H. Bloom. Space/time Tradeoffs in Hash Coding with Allowable Errors. ACM Communications 13(7), 1970.

[4] Snort. http://www.snort.org.

[5] V. Paxson. Bro: A System for Detecting Network Intruders in Real-time. Computer Networks, 31(23-24):2435-2463, 1999.

[6] L. T. Heberlein, G. V. Dias, K. N. Levitt, et al. A Network Security Monitor. IEEE Symposium on Research in Security and Privacy, 1990. [7] S. Stainford, J. A. Hoagland, and J. M. McAlerney. Practical Automated

Detection of Stealthy Portscans. ACM CCS, 2000.

[8] C. Leckie, R. Kotagiri. A Probablilistic Approach to Detecting Network Scans. IEEE Network Operations and Management Symposium, 2002. [9] J. Jung, V. Paxson, A. Berger, et al. Fast Portscan Detection Using

Se-quential Hypothesis Testing. IEEE Symposium on Security and Privacy, 2004.

[10] S. Stainford. Containment of Scanning Worms in Enterpirse Networks. IEEE INFOCOM, 2002.

[11] N. Weaver, V. Paxson, S. Staniford, et al. A Taxonomy of Computer Worms. ACM Workshop of Rapid Malcode, 2003.

[12] K. Levchenko, R. Paturi, and G. Varghese. On the Difficulty of Scalably Detecting Network Attacks. ACM CCS, 2004.

[13] G. Cheng, J. Gong, W. Ding, et al. A Hash Algorithm for IP Flow Measurement. Journal of Software, 16(5):652-658, 2005.