Procedia Engineering 29 (2012) 2156 â€“ 2161 1877-7058 Â© 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2012.01.279 Procedia Engineering 00 (2011) 000â€“000

**Procedia**

**Engineering**

www.elsevier.com/locate/procedia
### 2012 International Workshop on Information and Electronics Engineering (IWIEE 2012)

### A Predict Deterministic Finite Automaton for Practical Deep

### Packet Inspection

### Qiang Wei

a_{*, Yunzhao Li}

a_{, Yanjie Chu}

a,
*a _{Science and Technology On Blind Signal Processing Laboratory, Chengdu, China }*

**Abstract **

Deep packet inspection has become extremely important due to network security. In deep packet inspection, the packet payload is compared against a set of patterns specified as regular expressions. Regular expressions are often implemented as deterministic finite automaton (DFA) for matching in linear time at high network link rates.

We proposed a predict DFA which can accelerate the processing speed of DFA. A predict DFA uses additional information to predict several next transitions. We tested our proposal on Layer7 rule-set and validated it on real traffic traces, experiments show that our approach offers a significant performance improvement by accelerate rate factors from 1.6 to 2.8 over original DFA.

Â© 2011 Published by Elsevier Ltd.

*Keywords:*Deep packet inspection; Regular expression ;DFA

**1. Introduction **

Packet content scanning is crucial to network security and network monitoring applications. In these applications, the payload of packets in a traffic stream is matched against a given set of patterns to identify specific classes of applications, viruses, protocol definitions, etc.

Due to expressive power, simplicity, flexibility, regular expressions are replacing explicit string
patterns in deep packet inspection. Most popular software toolsâ€”_{including Snort[1] and Bro[2]}â€”_{and }

networking devicesâ€”_{including the Cisco family of Security Appliances[3] and the Citric Application }

Firewall[4]â€”_{ use regular expressions as their pattern language. }

* Corresponding author. Tel.: 086- 028-8659-5396; fax: 086-028-8659-5365.

*E-mail address*: weiqianglg@gmail.com.

Open access under CC BY-NC-ND license.

Finite automatons are typically employed to implement regular expressions. There are two types of finite automation: Non-deterministic Finite Automaton (NFA) and Deterministic Finite Automaton (DFA). Unlike NFA, DFA has a predictable memory bandwidth requirement, processing an input string involves one DFA state traversal per character. Consequently, DFA is a preferred method at high network link rates. But DFA corresponding to large sets of complicate regular expressions requires prohibitively amount memory.

When DFA cannot satisfy the matching speed, a stride-*k* DFA is considered [5]. A stride-*k* DFA
consumes *k* characters per state transition rather than just one, thus yielding a *k*-fold performance increase.
The basicproblem in building a DFA of stride *k*, or *k*-DFA, lies in the memory requirement. In fact, given
a DFA defined on an alphabet Î£, the non-compressed* k*-DFA will have |Î£|*k*_{ outgoing transitions per state. }

In this paper we propose a predict DFA, a solution accelerating the processing speed of DFA. When
constructing a predict DFA, some additional information is added to DFA. Comparing with *k*-DFA, a
predict DFA nearly equal DFA for size, and can process more than one character at the same time.

**2. Related work **

To cope with the prohibitively amount memory of DFA, Several algorithmic techniques have been proposed.

Yu et al [6] proposed segregating rules into multiple groups and evaluating the corresponding DFAs concurrently. Yuâ€™s solution decreases memory space requirements but increases memory bandwidth linearly with the number of active DFAs.

D2_{FA [7] exploit the redundancy present in the transitions between states and save 90% space. D}2_{FA }
leads to a trade-off between the size of DFA representation and the memory bandwidth required to
traverse it. Following D2_{FA, CD}2_{FA [8] was proposed by using recursive content labels to reduce the }
diameter bounds of D2_{FA to 2 with one 64-bit wide memory access. In [9], Becchi limited the bound of }
the number of default paths in D2_{FA, and improve the worst case of D}2_{FA. }

Hybrid FA [10] combines the benefits of DFA and NFA. Any nodes that would contribute to state explosion retain an NFA encoding, while the rest are transformed into DFA nodes. Hybrid FA gets size nearly as same as an NFA and small memory bandwidth of a DFA.

In [11], XFA was proposed to use finite automata plus sketch memory to solve the problem. The auxiliary memory (data domain in their terminology) was used to keep track of Kleene closure and length restrictions. However, XFA requires to manually generating EIDD (Efficient Implementable Data Domain) for each regex which could be very time-consuming and error prone if the operators are not experienced.

Besides these, Alphabet reduction is a basic technique for mapping the set of symbols found in an alphabet to a smaller set by grouping characters that label the same transitions everywhere in the automaton[5][9].

Majumder designed a high performance regular expression matching system that employs DFA state caching [12]. In [12], high frequency and densely connected DFA portions was extracted and stored in a static cache for accelerating the matching speed of DFA.

**3. Predict DFA **

Our approach is not designed for compress the space of DFA but for accelerating it. A predict DFA is
orthogonal to other methods such as D2_{FA, hybrid FA and states caching. }

*3.1.*Â·*Motivating example *

We introduce our approach using an example. Figure 1(a) shows part of a standard DFA defined on
alphabet *ASCII* that recognizes the rlogin and RTSP protocol from Linux layer 7 filter [13], *p1*

*=^[a-z][a-z0-9][a-z0-9]+/[1-9][0-9]?[0-9]?[0-9]?00*, *p2=rtsp/1.0 200 ok*. In this DFA, state 0 is the initial state.

Fig.1 (a) Part of DFA which recognize the expressions ^[a-z][a-z0-9][a-z0-9]+/[1-9][0-9]?[0-9]?[0-9]?00,
*rtsp/1.0 200 ok; (b) Predict DFA table for Fig.1(a) *

It shows that transitions from a state to its next states are not evenly distributed in figure 1(a). Assuming characters are evenly distributed in packets payload, when current state is 0, next state will be state 2 with much smaller possibility (1/256) contrasting to state 3 with possibility of 230/256. So we can predict that state 0â€™s next state will be state 3, and the next two state of 0 will be state 3 with possibility of 0.89(230/256*255/256).

Figure 1(b) shows the 2-step predict table of the above DFA. All the tables are all implemented on
hardware (FPGA or ASIC). When processing the *i*th character in *Str*, the* i+1*th and *i+2*th character are
read in parallel at the same time. Consider the operation on the input string *AB@#*. *Str[0]=A *makes the
initial state 0 go to state 3. At the same time, the 1-step predict table gives state 3 for predication on state
0, *Str[1]=B* makes state 3 go to itself. The 2-step predict table also gives state 3 on state 0, state 3 still
goes to itself on *Str[2]=@*. In this situation, both of the predict table give correct prediction, so the
predict DFA now can jump to process *Str[3]=#* on state 3, and the 4 characters only cost 2 clock circles.

Note that the predict DFA is a combination of DFA and predict table. It is obvious that the (*i+1*)-step
predict table relies on the *i*-step one, because if the prediction of the *i*-step is wrong, there is no need to do
the (*i+1*)-step. If the predict tables are computed properly, we could expect that most of the input packets
can be processed much quicker than the original DFA.

*3.2. Computing the predict tables *
*3.2.1. Convert DFA to Markov chain *

All the predict tables can be computed easily if characters are evenly distributed and irrelevant. We
first compute the transition probabilities of the original DFA. The transition probability between state* m *

and state* n *is equal to the proportion of transitions number between them to alphabet size (typically 256).
After that, the original DFA is converted to a homogeneous Markov chain. Figure 2 shows the Markov
chain corresponding to figure 1.

It is well known that the steady state distribution of a irreducible ergodic Markov chainÏ€ can
calculated by
[1,1...1]*T* 1
*P*
Ï€ Ï€
Ï€
=
âŽ§
âŽ¨
=
âŽ© *(1)*

Where *P* is the transition matrix. Generally, we can estimateÏ€ by

(0)_{P}l

Ï€ Ï€= * (2) *

Where (0) [1,0,...0]Ï€ = is the initial distribution, *l* should be big enough. By orderingÏ€ , we can pick

the most frequently visited states into set *S1* for prediction. For each picked states *m* in* S1*, the 1-step

predict state *n* is decided by

( , ) max ( , )
*k S*

*p m n* *p m k*

âˆˆ

= * (3) *

Where *S* is the states set, *p(m,k)* is the transition probability between state *m* and state *k*. If *p(m,n)<0.5, *

state* m* will be removed from *S1*. The *i*-step popular set *Si* is initialized by *Si-1* and the *i*-step predict state *n*

for state *m*is decided by

( , ) max ( , )
*i* *i*
*k S*
*p m n* *p m k*
âˆˆ
= * (4) *

Where *pi _{(m,k)}*

_{ is the }

_{i}_{ step transition probability which can be calculated by Chapman-Kolmogorov }

equation. If *pi _{(m,k)}*

_{ <0.5, state }

_{m}_{ will also be remove from }

_{S}*i*.

*3.2.2. Learning from real data streams *

In real network data streams, characters are relevant and not evenly distributed, the Markov chain model is not proper. So we introduce a training mechanism to compute the predict tables. The pseudo-code of the algorithm is provided below.

---

* input: dfa*: the original DFA,

*Ni*(1â‰¤ â‰¤

*i n*): size limit of the

*i*-step predict table,

*training_data*: the

training network data stream.

**output:***Ti* (1â‰¤ â‰¤*i n*): *i*-step predict table

use *dfa* to match *training_data*, pick the most *N1*frequently visited states into *S0*;

initialize round *r*=1;

**repeat**

** **clear *Count, tempT*;
*Sr* =*Sr*âˆ’1;

*cur_state*=initial state of *dfa*;

**for **character *c* in *training_data *{

*next_one_state = dfa (cur_state, c); *

**if** *cur state S*_ âˆˆ *r* **and** all *Ti* (1â‰¤ <*i r*) predict right{

*next_rth_state =* the next *r*th state from *cur_state*;
*Count[cur_state->next_rth_state]*++;

}

*cur_state = next_one_state*;
}

**for** *s S*âˆˆ *r*

pick next state *t*where *Count[s->t]* is maximum for state * s*, add *s->t * to *tempT*;
order *tempT* by the value in *Count*, pick the top *Nr* ones, add to *Tr*;

remove items from* Sr*which cannot be find in *Tr*;

* until r*=

*n*

---
In this algorithm, we first pick the most frequently visited states. Based these, we compute all the
predict tables one by one recursively. The sequence of the size limit* Ni* (1â‰¤ â‰¤*i n*) is an important factor

for performance.

**4. Experimental results **

To focus on regular expressions commonly used in networking applications, we select the Linux layer 7 filter [13] which contains 70 patterns for detecting different protocols. We segregated them into 4 groups using Yu algorithm [6] and then made the corresponding predict DFA for each group.

In our experiments, we used the real-world TCP and UDP network packet trace obtained from the MIT Lincoln Laboratory [14]. This data set contains payload captured during a particular week. We used Monday data for training, and the rest for testing. Because of small steps (â‰¤8) and predict table size (â‰¤5), both of the Markov chain and learning algorithm gave the same results in our experiments.

Becchiâ€™s C++ code was used to convert regular expressions into DFAs and match the network data stream for simulation.

Figure 3 shows the result for 1-step predict DFA. In figure 3(a), the original DFA which has 3719 states is from group 1. Most time the 1-step predict table gives right prediction, so the predict DFA can processing 2 characters in a single clock circle. In figure 3(b), we made 4 predict DFA for the 4 DFAs, and tested them on Tuesdayâ€™s data set. The results show that the 1-step predict DFA can offer a significant performance improvement by accelerate rate factor over 1.6.

Figure 4(a) shows the result for multi-step predict DFA. In this experiment, we made *Nn*=* Nn-1*=â€¦=* N1*

for convenience. The original DFA is from group 2 which has 3829 states and the test data set is
Tuesdayâ€™s. When the number of the predict tables increases, the accelerate rate factor increases too, but
there are two limitations. The first one is predicting steps. When the predicting steps increase, the right
probability for prediction decreases, so the growth speed of the accelerate rate slows down. Figure 4(b)
illuminates this situation. In figure 4(b),* Nn*=* Nn-1*=â€¦=* N1*=5; the second limitation is resource. Too many

predict tables are impossible to store for hardware.

1 1.5 2 2.5 3 3.5 4 4.5 5 1.89 1.9 1.91 1.92 1.93 1.94 1.95 1.96 1.97 1.98

size limit of 1-step predict table

acce le ra te r ate Tuesday Wednesday Thursday Friday 1 1.5 2 2.5 3 3.5 4 4.5 5 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95 2 2.05 2.1

size limite of 1-step predit table

ac cel er at e r at e

group 1 DFA(3719 states) group 2 DFA(3829 states) group 3 DFA(3124 states) group 4 DFA(1530 states)

**Â **

(a) (b)

Fig.3 (a) accelerate rate for different days on DFA from group 1; (b) accelerate rate for different DFAs on Tuesdayâ€™s data set

1 1.5 2 2.5 3 3.5 4 4.5 5 1.6 1.8 2 2.2 2.4 2.6 2.8 3

size limite of multi-step predict table

ac

cel

era

te ra

te

1-step predict DFA 2-step predict DFA 3-step predict DFA 4-step predict DFA

1 2 3 4 5 6 7 8 1.5 2 2.5 3 3.5 steps of predicition ac ce le rat e r at e

Fig.4 (a) accelerate rate for multi-step DFA on Tuesdayâ€™s data set for DFA from group 2; (b) accelerate rate for different steps predict DFA on Tuesdayâ€™s data set for DFA from group 2

**5. Conclusions **

In this paper, we presented a predict DFA for accelerating the original DFA on hardware devices. In
order to computing the predict tables, we proposed two methods, the Markov chain for simply situation
and the learning algorithm for practical network. The predict DFA is orthogonal to other methods such as
D2_{FA and hybrid FA, they can be used together to improve performance and space explosion. Experiment }
results shows that our predict DFA implementation is 1.6 to 2.8 orders of magnitude faster than DFA.

**References **

[1] Snort network intrusion detection system. http://www.snort.org.

[2] Bro intrusion detection system. http://bro-ids.org.

[3] Cisco systems. Cisco ASA 5505 adaptive security appliance. http://www.cisco.com.

[4] Citrix systems. Citrix application firewall. http://www.citrix.com.

[5] Brodie BC, Cytron RK, Taylor DE. A scalable architecture for high-throughput regular-expression pattern matching.

ISCA 2006.

[6] Fang Yu, Zhifang Chen, Yanlen Diao, et al. Fast and memory-efficient regular expression matching for deep packet

inspection. Proceedings of the 2006 ACM/IEEE Symposium on Architectures for Networking and Communications Systems(ANCS). 2006. 93-102.

[7] Kumar S, Dharmapurikar S, Fang Yu ,et al. Algorithms to accelerate multiple regular expressions matching for deep

packet inspection. Proceedings of the 2006 conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. 2006. 339-350.

[8] Kumar S, Turner J, Williams J. Advanced algorithms for fast and scalable deep packet inspection. Proceedings of the

2006 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). 2006. 81-92.

[9] Becchi M, Crowley P. An improved algorithm to accelerate regular expression evaluation. ANCS .2007.145-154.

[10] Becchi M, Crowley P. A hybrid finite automaton for practical deep packet inspection. Proceedings of the International

Conference on emerging Networking EXperiments and Technologies (CoNEXT).2007.1-12.

[11] Smith R, Estan C, Jha S. XFA: Faster signature matching with extended automata. IEEE Symposium on Security and

Privacy. 2008. 187-201.

[12] Majumder A, Rastogi R, Vanama S. Scalable regular expression matching on data streams. SIGMOD. 2008.161-172.

[13] Levandoski J, Sommer E, Strait M. Application layer packet classifier for Linux. http://l7-filter.sourceforge.net/.