Lucent Bit Vector Search - Bit Vector Algorithms

4.3 Bit Vector Algorithms

4.3.1 Lucent Bit Vector Search

The original Bit Vector Search proposed by Lakshman and Stiliadis [76], which is also referred to as the Lucent Bit Vector Scheme [31] or simply the Lucent Scheme [127], is a decompositional classification algorithm, in the sense that it de- composes the original d-dimensional classification problem into d one-dimensional search problems that can quickly be solved independently. Once the d partial solutions for the one-dimensional problems are computed, they can be combined to obtain the desired overall solution, which is the index of the most highly prioritized matching rule. In the remainder of this section, we refer to the Bit Vector Search algorithm by the abbreviation BV.

In its preprocessing phase, BV constructs d one-dimensional search data structures from the geometric view of the specified rule set R. In order to visualize the bit vector preprocessing, we use the two-dimensional rule set shown in Table 4.3 as a running example throughout this section. Each rule Ri in R is regarded as a

d-dimensional hyperrectangle B(Ri) = [︁ X₁i, Y₁i]︁ × . . . ×[︁ X_di, Y_di]︁ in the bounding box B(H) of the header space H, i. e., B(Ri) ⊆ B(H). In the first step of the

preprocessing phase, the endpoints Xi

j and Yji of each rule Riare projected onto

the jth axis of the bounding box B(H). Thereby, for all 1 ≤ j ≤ d, the jth axis is partitioned into αj disjoint intervals I_kj, with 1 ≤ αj ≤ 2n + 1. In the next step,

a bit vector Vj_kof length n is assigned to each interval I_kj. Each bit bi in the bit

vector Vj_kindicates whether an for incoming packet p, whose jth header value hp_j may fall into the interval I_kj, matches rule Ri in the jth dimension. Accordingly,

Vj_k’s ith bit is set iff the interval I_kj intersects with rule Ri’s jth check interval

[︂ X_ji, Y_ji]︂, i. e., ∀ i ∈ {1, . . . , n} : Vj_k[i] = ⎧ ⎪ ⎨ ⎪ ⎩ 1, if I_kj∩[︂X_ji, Y_ji]︂̸= ∅ 0, otherwise. (4.4)

This procedure is sketched in Figure 4.3 for the rule set from Table 4.3.

After the intervals Ij_kand the corresponding bit vectors Vj_khave been computed in the algorithm’s preprocessing phase, the classification of an incoming packet

Nr. /

Priority Field F1 Field F2 Action

R1 [ 2, 5] [ 2, 9] a1

R2 [12, 13] [ 4, 5] a2

R3 [ 4, 9] [ 8, 13] a3

R4 [ 8, 15] [10, 11] a4

Fig. 4.3: Sketch of the Bit Vector Search data structure generated for the geometric representation of the two-dimensional rule set R over the header space H = H1× H2= [0, 15]2from Table 4.3. The packet p with the header hp_{= (4, 11)}_{is used to illustrate the bit vector retrieval.}

p is executed in two consecutive steps. First, each of the packet p’s header fields hp_j is used to retrieve the bit vector Vj _{that belongs to the interval I}j

containing hp_j, as sketched in Figure 4.3 for the packet header hp _{= (4, 11).}

Note that there always exists exactly one such interval, since for every header field dimension, the entire header field domain Hj is partitioned into mutually

disjoint intervals. Subsequently, the retrieved bit vectors, which each represent the matching information for a single dimension, are bitwise ANDed in order to obtain a final result vector Vresfor p, i. e.,

∀ i ∈ {1, . . . , n} : V_res[i] =

⋀︂

j=1

Vj[i]. (4.5)

Accordingly, each bit Vres[i] is set iff p matches the rule Ri in every regarded

dimension. Therefore, finding the most highly prioritized matching rule Ri∗

translates to finding the index i∗of the first set bit Vres[i∗]in Vres, which can be

achieved through a linear scan over Vres. In the example shown in Figure 4.3, the

vectors V1 _{= V}1

3 and V2 = V52are used to compute the result vector

Vres= V1∧ V2 = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 0 1 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ∧ ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0 1 1 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ R1 0 R2 0 R3 1 R4 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ . (4.6)

Vreshas only one set bit at the third position corresponding to rule R3, which is

indeed the most highly prioritized matching rule, as the figure confirms.

The memory requirements and the preprocessing time required by BV are at most quadratic in the number of rules, since every dimension may yield 2n + 1 bit vectors of n bits each. The classification operation requires d bit vector retrievals, which can generally be implemented as binary searches due to the ordering of the intervals. Furthermore, the computation of the final result vector as well as finding the first set bit require time linear in the number of rules. Still, BV often performs significantly faster than both Linear Search and Tuple Space Search, because many of its operations can be efficiently vectorized even on general purpose CPU systems [10]. If we denote the machine word width by w (which is typically 32 or 64 bit in off-the-shelf systems), then every vector can be represented by⌈︁n

⌉︁

machine words, Therefore, a practical implementation can perform d−1 wordwise AND operations on words of the partial vectors in order to compute one word of the result vector Vres. Subsequently, a single comparison instruction can decide

whether at least one bit in the result word is set. Only if this is the case, the w bits in the word must be checked, otherwise the next result word can be computed, which effectively allows to traverse large parts the result vectors wordwise rather than bitwise. In contrast to the previously discussed algorithms, BV does not support for quick incremental updates, because the addition or removal of rules requires an adjustment of every single bit vector and thus results in an effort quadratic in n. Hence, a change in the rule set typically requires a rebuild of the search data structure. Table 4.4 provides an overview over the key performance characteristics of BV.

Classification Data structure Data structure Memory

operation creation update requirements

O(d · log(n) +⌈︁n w

⌉︁

) O(d · n2₎ _{O(d · n}2₎ _{O(d · n}2₎

n: number of rules d: number of fields w: machine word width

Due to its decompositional nature, BV is suitable for implementation on a wide variety of hardware platforms, because the bit vector retrieval operations can be solved independently and therefore in parallel [68,151]. Also, some platforms such as ASICs or FPGAs provide support for large machine word widths w ≫ 64 and can compute the first set index in the result vector in logarithmic time, which further reduces the classification latency [56,15], as explained in Section 15.2.

In document System-Specialized and Hybrid Approaches to Network Packet Classification (Page 50-53)