4.3 Bit Vector Algorithms
4.3.1 Lucent Bit Vector Search
The original Bit Vector Search proposed by Lakshman and Stiliadis [76], which is also referred to as the Lucent Bit Vector Scheme [31] or simply the Lucent Scheme [127], is a decompositional classification algorithm, in the sense that it de- composes the original d-dimensional classification problem into d one-dimensional search problems that can quickly be solved independently. Once the d partial solutions for the one-dimensional problems are computed, they can be combined to obtain the desired overall solution, which is the index of the most highly priori- tized matching rule. In the remainder of this section, we refer to the Bit Vector Search algorithm by the abbreviation BV.
In its preprocessing phase, BV constructs d one-dimensional search data structures from the geometric view of the specified rule set R. In order to visualize the bit vector preprocessing, we use the two-dimensional rule set shown in Table 4.3 as a running example throughout this section. Each rule Ri in R is regarded as a
d-dimensional hyperrectangle B(Ri) = [︁ X1i, Y1i]︁ × . . . ×[︁ Xdi, Ydi]︁ in the bounding box B(H) of the header space H, i. e., B(Ri) ⊆ B(H). In the first step of the
preprocessing phase, the endpoints Xi
j and Yji of each rule Riare projected onto
the jth axis of the bounding box B(H). Thereby, for all 1 ≤ j ≤ d, the jth axis is partitioned into αj disjoint intervals Ikj, with 1 ≤ αj ≤ 2n + 1. In the next step,
a bit vector Vjkof length n is assigned to each interval Ikj. Each bit bi in the bit
vector Vjkindicates whether an for incoming packet p, whose jth header value hpj may fall into the interval Ikj, matches rule Ri in the jth dimension. Accordingly,
Vjk’s ith bit is set iff the interval Ikj intersects with rule Ri’s jth check interval
[︂ Xji, Yji]︂, i. e., ∀ i ∈ {1, . . . , n} : Vjk[i] = ⎧ ⎪ ⎨ ⎪ ⎩ 1, if Ikj∩[︂Xji, Yji]︂̸= ∅ 0, otherwise. (4.4)
This procedure is sketched in Figure 4.3 for the rule set from Table 4.3.
After the intervals Ijkand the corresponding bit vectors Vjkhave been computed in the algorithm’s preprocessing phase, the classification of an incoming packet
Nr. /
Priority Field F1 Field F2 Action
R1 [ 2, 5] [ 2, 9] a1
R2 [12, 13] [ 4, 5] a2
R3 [ 4, 9] [ 8, 13] a3
R4 [ 8, 15] [10, 11] a4
Fig. 4.3: Sketch of the Bit Vector Search data structure generated for the geometric representation of the two-dimensional rule set R over the header space H = H1× H2= [0, 15]2from Table 4.3. The packet p with the header hp= (4, 11)is used to illustrate the bit vector retrieval.
p is executed in two consecutive steps. First, each of the packet p’s header fields hpj is used to retrieve the bit vector Vj that belongs to the interval Ij
containing hpj, as sketched in Figure 4.3 for the packet header hp = (4, 11).
Note that there always exists exactly one such interval, since for every header field dimension, the entire header field domain Hj is partitioned into mutually
disjoint intervals. Subsequently, the retrieved bit vectors, which each represent the matching information for a single dimension, are bitwise ANDed in order to obtain a final result vector Vresfor p, i. e.,
∀ i ∈ {1, . . . , n} : Vres[i] =
d
⋀︂
j=1
Vj[i]. (4.5)
Accordingly, each bit Vres[i] is set iff p matches the rule Ri in every regarded
dimension. Therefore, finding the most highly prioritized matching rule Ri∗
translates to finding the index i∗of the first set bit Vres[i∗]in Vres, which can be
achieved through a linear scan over Vres. In the example shown in Figure 4.3, the
vectors V1 = V1
3 and V2 = V52are used to compute the result vector
Vres= V1∧ V2 = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 0 1 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ∧ ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0 1 1 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ R1 0 R2 0 R3 1 R4 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ . (4.6)
Vreshas only one set bit at the third position corresponding to rule R3, which is
indeed the most highly prioritized matching rule, as the figure confirms.
The memory requirements and the preprocessing time required by BV are at most quadratic in the number of rules, since every dimension may yield 2n + 1 bit vectors of n bits each. The classification operation requires d bit vector retrievals, which can generally be implemented as binary searches due to the ordering of the intervals. Furthermore, the computation of the final result vector as well as finding the first set bit require time linear in the number of rules. Still, BV often performs significantly faster than both Linear Search and Tuple Space Search, because many of its operations can be efficiently vectorized even on general purpose CPU systems [10]. If we denote the machine word width by w (which is typically 32 or 64 bit in off-the-shelf systems), then every vector can be represented by⌈︁n
w
⌉︁
machine words, Therefore, a practical implementation can perform d−1 wordwise AND operations on words of the partial vectors in order to compute one word of the result vector Vres. Subsequently, a single comparison instruction can decide
whether at least one bit in the result word is set. Only if this is the case, the w bits in the word must be checked, otherwise the next result word can be computed, which effectively allows to traverse large parts the result vectors wordwise rather than bitwise. In contrast to the previously discussed algorithms, BV does not support for quick incremental updates, because the addition or removal of rules requires an adjustment of every single bit vector and thus results in an effort quadratic in n. Hence, a change in the rule set typically requires a rebuild of the search data structure. Table 4.4 provides an overview over the key performance characteristics of BV.
Classification Data structure Data structure Memory
operation creation update requirements
O(d · log(n) +⌈︁n w
⌉︁
) O(d · n2) O(d · n2) O(d · n2)
n: number of rules d: number of fields w: machine word width
Due to its decompositional nature, BV is suitable for implementation on a wide variety of hardware platforms, because the bit vector retrieval operations can be solved independently and therefore in parallel [68,151]. Also, some platforms such as ASICs or FPGAs provide support for large machine word widths w ≫ 64 and can compute the first set index in the result vector in logarithmic time, which further reduces the classification latency [56,15], as explained in Section 15.2.