10.3 Enhancements of the Basic Scheme
10.3.2 A Priori Reduction
RuleBender and the existing reduction-based rule set modification techniques Fire- wall Rule Optimization (FIRO) [71], Complete Redudancy Removal (CRR) [84], and Firewall Compressor (FC) [88] work towards the same goal, namely the generation of rule sets that can be traversed faster. However, RuleBender pursues a strategy that is opposed to the approach taken by reduction-based schemes: instead of reducing the number of rules, RuleBender enlarges the rule sets through decision tree encodings. This motivates an a priori reduction step through either FIRO, CRR, or FC before the RuleBender transformation. Accordingly, the computation of the output rule set R′ happens in a two-step process: in the first step, an
intermediate compressed rule set RC is generated. After RC has been computed,
we use it as the input rule set for RuleBender in order to generate the output rule set R′ in the second step. This procedure is sketched in Figure 10.4.
A priori reduction works without restrictions if the input rule set R does not contain any complex rules. However, care must be taken if R contains at least one complex rule, because neither FIRO, CRR, nor FC can deal with arbitrary complex rules. Therefore, in this case, we apply the following strategy: assuming that the rules at the indices i1, . . . , ikare complex, we extract the k + 1 sub rule sets
R1, . . . , Rk+1that consist only of simple rules, i. e.,
⟨R1, . . . , Ri1−1 ⏞ ⏟⏟ ⏞ R1 , Ri1, Ri1+1, . . . , Ri2−1 ⏞ ⏟⏟ ⏞ R2 , Ri2, . . . , Rn ⏞ ⏟⏟ ⏞ ... ⟩. (10.3)
Subsequently, we apply selective minimization to every sub rule set Rj in order to
compute the compressed sub rule sets RC
j. The entire compressed rule set RC,
which is used as input for RuleBender, is then obtained by concatenating the sub rules sets RC
1 . . . RCk+1and the complex single-element rule sets ⟨Ri1⟩ . . . ⟨Rik⟩ in
the correct order by
RC = RC1⟨Ri1⟩ . . . ⟨Rik⟩R
C
k+1. (10.4)
As demonstrated in Section 10.5, the benefits of selective minimization, when applied prior to RuleBender, are twofold: first, the size of the output rule set R′is significantly reduced, in some cases by an order of magnitude. Second, this size reduction often leads to more efficient packet classification.
Memory Output rule Max. output rule Supports Algorithm Runtime
requirements set size set path length complex checks
Related work
FIRO [71] O(d · n2) O(d · n) O(n) O(n) no
CRR [84] O(nd) O(nd) O(n) O(n) no
FC [88] O(nd+1) O(nd+1) O(n) O(n) no
Proposed approach RuleBender O (︃ k ∑︁ i=1 |Ti| )︃ O (︃ k ∑︁ i=1 |Ti| )︃ O (︃ k ∑︁ i=1 |Ti| )︃ O(︁ max Ti↑)︁ yes (match-based) Ti, i ∈ {1, . . . , k}: the HC/HS decision trees T
↑
i: height of Ti(assumed in O(d) (HC)/O(d · log(n)) (HS))
|Ti|: number of nodes in Ti(assumed in O(nd)) n: number of rules d: number of fields
Tab. 10.1: RuleBender performance characteristics, in comparison with related work.
10.4
Performance Characteristics
After having discussed the basic RuleBender scheme as well as its enhancements, we take a look at RuleBender’s performance characteristics. The time it takes to transform an input rule set R into an output rule set R′ is dictated by the time of the decision tree creations. Therefore, assuming that k decision trees are created (e. g., for different protocols, as described in Section 10.2), the transformation time and the memory footprint are in O(︂∑︁k
i=1|Ti|
)︂
, where the Ti are the generated decision trees and |Ti| denotes the number of nodes in Ti.
The same holds for the size of the generated output rule set R′, which directly depends on the size of the generated decision trees. Of course, the transformation time increases correspondingly, when a priori reduction is applied, but since the asymptotic runtime of FIRO, CRR, and FC, are in O(nd), the asymptotic runtime
of RuleBender does not increase in these cases. The same reasoning can also be applied to the other performance characteristics under consideration.
When it comes to the maximum classification path length for an incoming packet, we can say that it is bounded by the height of the largest decision tree, plus a constant number of rules for the tree dispatch and the rules in the leaf nodes. As previously mentioned in Section 4.5, we assume that the height of HiCuts/Hy- perSplit trees to be in O(d)/O(d · log(n)), and the number of nodes in the trees to be in O(nd), as suggested by the literature [58, 61, 107,144, 147]. These
performance characteristics are summarized in Table 10.1, in comparison to all existing static transformation approaches that were depicted in Chapter 9.
10.5
Evaluation
In this chapter, we evaluate RuleBender by comparing it with the existing FIRO [71], CRR [84], and FC [88] approaches on the basis of the four most important key performance indicators for rule set transformation schemes: the transformation time, the rule set size expansion factor, the mean classification path length, and the resulting achievable classification throughput when using the transformed rule sets. Furthermore, we examine each of the proposed RuleBender modifications in detail, and additionally investigate the influence of the decision tree binth param- eters as well as the number of different actions on the quality of the generated output rule sets.
10.5.1
Experiment Setup
In order to carry out our experiments, we use our C implementations of RuleBen- der, FIRO, CRR, and FC. Also, we employ the widely used [43,13,69,92,107] ClassBench packet classification benchmark [132] to generate rule sets of sizes between 64 and 4,096 rules, in steps of 2i with i ∈ {6, 8, 10, 12}. For each size,
we generate ten different rule sets using ClassBench’s Access Control List (acl1) rule set template. Each ClassBench-generated rule set consists of rules that define subnet checks on source and destination IPv4 addresses, the transport protocol (which is either TCP, UDP, or unspecified), and port fields. ClassBench is also used to generate a uniformly distributed header trace of 20,000 headers for every rule set.
Our evaluation setup consists of three computers: a sender machine with an Intel Xeon E5-1660 3.3 GHz CPU with eight physical cores (Hyper-Threading disabled) and 128 GB of RAM running Ubuntu Linux 17.04 Server, as well as two firewall machines equipped with a quad-core Intel Celeron 1.6 GHz CPU and 8 GB of RAM. One firewall machine runs Ubuntu Linux 17.04 Server with iptables 1.6.0 and nftables 0.6, the other firewall machine runs FreeBSD 11.0 with ipfw. The sender is directly connected to each firewall machine via two 1 Gbit/s Ethernet links, where the first link is used to send traffic from the sender to the firewall machine, and the second link is used to relay all processed packets back to the sender. Accordingly, the firewall machines’ routing tables are configured to directly forward each incoming packet back to the sender machine on a different interface. That way, we can evaluate the classification throughput of the firewall receiver machines by counting the number of packets received back on the corresponding sender interface. This evaluation setup is illustrated in Figure 10.5.
Fig. 10.5: Sketch of the RuleBender evaluation setup, showing the sender and one firewall machine.
Our evaluation code was compiled on the sender machine using gcc 6.3.0 with the compile options -Wall -Wextra -pedantic-errors -Werror -std=gnu99 -march=native -O3 -DNDEBUG. The rule set transformations are executed on the sender machine, which also generates and sends the traffic of 64 byte sized TCP and UDP packets corresponding to the traces generated by ClassBench. We use the tcpreplay [176] tool to send the packets as fast as possible over a time period of ten seconds by looping over the trace to either the Linux or FreeBSD firewall machine. The firewall machines’ packet filters are configured with the rule set under test, and after each test run, we extract the number of packets that were processed during ten seconds using the netstat tool. The last rule in every rule set is a match-all rule with an ACCEPT action. We evaluate the rule sets with two, four, and eight different actions ACCEPTi, where each action is a redirection to
ACCEPT (implemented via jumps). As ClassBench does not generate these actions, we take the generated rule sets and randomly distribute the actions over the rules within the rule set. These redirections are used in order to study the effect of different numbers of actions, without having to use real actions like DROP or REJECT, which would distort our measurement results. Hence, every rule in the source rule sets defines a randomly chosen action in {ACCEPTi|i < imax}, with
imax ∈ {2, 4, 8}. This is important for meaningful benchmarks, since the CRR and
FC approaches would reduce every rule set, where each rule defines the same action, to a one-rule output rule set, which is not a realistic use case.
In the remainder of this section, the data points in the plots show the mean result of ten evaluation runs (each run with a different randomly generated rule set) and with corresponding 95% confidence intervals, if not stated otherwise in the plot captions. The transformation algorithms under scrutiny in are shown in Table 10.2. If not stated otherwise, the binth parameter β for the decision trees, as described in Section 4.5, is set to 16, because smaller values of β often lead to significantly longer preprocessing times. We deliberately choose to not cover the entire crossproduct of possible algorithm combinations, but instead focus on a meaningful subset of techniques that implement different algorithmic idea and yield a measurable improvement. For example, we do not cover HiCuts with inlined right branches, as it yields worse results than HyperSplit with inlined right branches and utilizes the same algorithmic enhancement.
Transformation algorithm Abbreviation
Firewall Rule Optimization FIRO
Firewall Rule Optimization CRR
Firewall Compressor FC
RuleBender with HiCuts HC
RuleBender with HyperSplit HS
RuleBender with HyperSplit
and inlined right branches HS (inline)
RuleBender with HyperSplit,
inlined right branches, and a priori CRR CRR → HS (inline) RuleBender with HyperSplit,
inlined right branches, and a priori FC FC → HS (inline) Tab. 10.2: Evaluated transformation algorithms.
In order to verify the correctness of the transformed rule sets, we use a self-written linear-search-based interpreter that matches a trace of packet headers against a specified rule set and logs the sequence of actions determined for the different packet headers. We consider a transformed rule set R′ to be correct, if it yields the same sequence of actions as the original input rule set R. This sanity check is done for every input rule set/output rule set combination and for every evaluated algorithm, and passes in every case. Furthermore, we use the interpreter to determine the average classification path length, i. e., the number of rules a packet header traverses until a final classification verdict can be issued.