Classification Time, Memory Footprint, Preprocessing Time

5.4 Evaluation

5.4.2 Classification Time, Memory Footprint, Preprocessing Time

We begin the review of our algorithm performance evaluation by inspecting the number of observed unsuccessful search data structure creation attempts for the different approaches, as illustrated in Figure 5.4. The figure shows that, with increasing rule set size, the number of forcefully terminated builds increases for the RFC, HiCuts, and HyperSplit algorithms. This is especially well visible for rule sets with 65,536 rules, where 33.3 % of the RFC builds, 84.0 % of the HiCuts builds, and 95.1 % of the HyperSplit builds are unsuccessful. As we can see at the example of HiCuts, this can already happen at relatively small rule set sizes of 256 rules. This behaviour is well known both for decision trees [122,137] as well as RFC [58,81] and is considered a major obstacle for practical classification system implementations, such as Open vSwitch [105]. Despite this fact, these approaches, and especially RFC, are amongst the fastest algorithms with respect to classification performance, as we will see in the remainder of this section. We note that all other evaluated algorithms do not suffer from the abovementioned scalability problems.

Fig. 5.4: Forcefully terminated algorithm search data structure creation attempts for the RFC, HiCuts, and HyperSplit approaches for different rule set sizes.

Having discussed the scalability issues of existing best-in-class approaches, we now move on the measured average algorithm classification times, memory footprints, and preprocessing times, which are summarized in Figure 5.5, Figure 5.6, and Figure 5.7, respectively. Figure 5.5 shows that the JITing of the binary searches leads to improved classification performance for every regarded rule set size, when applied to the vanilla (Aggregated) Bit Vector Search algorithm. However, the results also imply that the JIT can be considered a micro optimization, as the performance gain is scarcely influenced by the number of rules. The biggest rela- tive performance gain factor of up to 1.5× (for the Bit Vector Search) is achieved for small rule sets with 16 or 32 rules, where the Jit Vector Search algorithm provides the overall best classification performance. The gain in classification performance, however, is bought by a significant increase in memory footprint and preprocessing time, as shown in Figure 5.6 and Figure 5.7, respectively. This overhead is mainly caused by the lookup table computation for small dimensions, which is constant and becomes less significant with an increasing number of rules.

When taking a look at the classification performance of the (Aggregated) Jit Vector Search approaches with SIMD instructions, we notice that for smaller rule set sizes than 1,024 rules, the non-SIMD variants are slightly faster. This is explained by the SIMD way to compute the index of the first bit, which is more complex than

the native 64 bit variant, as well as the SIMD register loads, as shown in Listing 5.3 and Listing 5.4. For larger rule set sizes with at least 1,024 rules, however, we clearly observe a significant performance improvement for the Jit Vector Search up to a factor of 2.6×. Also, as expected, for large rule sets, the performance gain in- creases with the SIMD instruction bit width W . In case of the Aggregated Jit Vector Search, the performance gain is significantly smaller. The reason for this behaviour is the fact that the Aggregated Bit Vector Search can skip many vector operations due to sparsely populated vectors, as explained in Section 4.3.2. As such, it does not take the same large advantage of SIMD instructions as the non-aggregated variant, as it performs significantly fewer vector operations. Nevertheless, the combination of SIMD instructions and the binary search JIT brings the Aggregated Bit Vector Search close to RFC’s performance, without suffering from scalability issues and at significantly faster preprocessing times and lower memory footprints. When compared to other high-performance algorithms, namely HiCuts and Hy- perSplit, we see that the fastest (Aggregated) Jit Vector Search variant always beats the decision tree algorithms in terms of classification performance. When it comes to preprocessing time and memory footprint, this is also true for medium to large rule set sizes.

Finally, we take a look at the dynamic Linear Search and Tuple Space Search approaches. While these algorithms provide superior preprocessing performance and low memory footprints, they clearly do not scale for larger rule set sizes with respect to lookup speed. For rule sets with 64 K rules, our fastest Jit Vector approach is about 3,667× faster than Linear Search and about 436× faster than Tuple Space Search. It should be mentioned that also for small rule set sizes, the Bit/Jit Vector Searches clearly outperform Linear Search and Tuple Space Search.

Fig. 5.5: Average algorithm classification times for different rule set sizes required for traces of 50K headers. Note that the RFC, HyperSplit, and HiCuts results only show results of successful search data structure builds.

Fig. 5.6: Average memory footprints of algorithm search data structures for different rule set sizes. Note that the RFC, HyperSplit, and HiCuts results only show results of successful search data structure builds.

Fig. 5.7: Average preprocessing times of algorithm search data structures for different rule set sizes. Note that the RFC, HyperSplit, and HiCuts results only show results of successful search data structure builds.

5.5 Limitations

Generally, the proposed Jit Vector Search approach achieves better classification performance at the cost of larger memory footprints and higher preprocessing times, while still keeping its scalability, in contrast to decision tree algorithms [63, 107] or RFC [62]. Despite its performance gains, Jit Vector Search comes at high costs in terms of memory footprint and preprocessing time for smaller rule set sizes. While this is not a problem for scenarios where the rule set only changes seldom or moderately often, it might render the usage for our proposed approach impossible for highly dynamic environments, such as SDNs. In fact, we address this issue in Chapter 6.

Furthermore, dynamic code generation may lead to security concerns in certain ap- plications, especially when the generated code is executed with kernel privileges. However, this seems to be a minor issue, especially when we take current develop- ment in the Linux kernel into account, which also uses dynamic code generators and specific static checkers to validate the generated instruction stream [34,99]. In fact, it is always possible to ensure the validity of the generated binary search trees and lookup tables, as they never include any backward jumps and do not contain function calls.

Although Jit Vector Search is primarily designed to efficiently solve the Geometric Packet Classification Problem, it can be adjusted to also tackle the Complex Packet Classification Problem. This can be achieved through iteration over the result vector and executing potentially existing complex checks that belong to rules with set bits.

6

The SFL Classification Algorithm

Two of the most difficult challenges a classification system can face are the line speed packet processing requirement and the ability to quickly process rule set updates, especially when used in dynamic environments. Many existing approaches to packet classification mainly aim to to accelerate the classification process, ranging from fast classification algorithms [31,62, 63,76,107,128] and rule set optimization techniques [49, 12, 13, 65, 84, 88] to hardware- centric approaches [3,56,15,136,138]. Most of these works require significant preprocessing times to set up their search data structures, which in turn can be traversed quickly when a packet enters the classification system. In consequence, they provide excellent lookup performance in setups where the rule set does not change often, such as static security policies. However, if the classification system is used in dynamic environments with frequent rule set changes at run time, such as SDNs, the ability to quickly update the search structure is of paramount importance. Unfortunately, existing approaches that support dynamic updates either come with comparatively slow classification performance [61,105,127] or require specific hardware setups [2,120,136].

In this chapter, we contribute the SFL approach, which is a technique to equip a given classification algorithm with the ability to quickly process updates while still maintaining high lookup performance. Specifically, we can augment an arbitrary existing classification algorithm A (the Fast) with a list-based update buffer B (the Lazy), as sketched in Figure 6.1. Rule set updates for the classification system, which are applied at system run time, are not installed immediately in the search structure of A, but are inserted in the update buffer B as well as in a master rule set (the Small). When a network packet is to be classified, it is first matched using A’s search data structure to compute a preliminary classification decision. Subsequently, this decision is checked based on the buffer and master rule set contents whether it is in conflict with a rule set update and is potentially modified. After sufficiently many updates have been collected, the classification data structure can be re-built once, thereby flushing the update buffer.

The main results of our evaluation are threefold: first, we demonstrate that existing fast classification algorithms fail to meet the requirements of highly dynamic environments, which results in severe throughput penalties. Second,

Fig. 6.1: Sketch of the SFL classification algorithm components.

we show that existing algorithms which support high update rates fall short in terms of throughput. Third, we show that fast SFL-“upgraded” algorithms perform significantly faster in dynamic environments than both existing fast and updateable classification algorithms. Specifically, some SFL-equipped algorithms can perform about an order of magnitude faster than the state-of-the-art dynamic algorithm Tuple Space Search [105,127] while processing up to 60 updates per second.

6.1 System Interface

Before we dive into the details of the proposed SFL classification approach, we first describe a set of common basic procedures which most packet classification systems, such as firewalls or SDN switches, need to provide in order to be of practical use.

In document System-Specialized and Hybrid Approaches to Network Packet Classification (Page 85-94)