Aggregated Bit Vector Search - Bit Vector Algorithms

4.3 Bit Vector Algorithms

4.3.2 Aggregated Bit Vector Search

The original BV algorithm described in Section 4.3.1 may require d ·⌈︁n w

⌉︁

memory accesses to perform a single classification operation, because each word wres

i of

the final result vector is computed by ANDing the words w_ij, j ∈ {, 1, . . . , d}, from the one-dimensional vectors at the corresponding positions. However, the bit vectors that are generated from rule sets are often sparse, in the sense that large parts of the vectors do not contain any set bits [31]. This, in turn, can lead to a situation where at least one of the d words wj_i that are used to compute w_ires consists entirely of unset bits. As a consequence, the inspection of wres

i will not

terminate the classification process, as every bit in wres_i will also be unset.

To illustrate this situation, Figure 4.4 shows two sparse example bit vectors V1

and V2_{for a two-dimensional rule set with 16 rules. Assuming a machine word}

width w = 2, a total of 14 memory accesses is required to locate the first set bit in the result vector Vres, which is stored in the word wres₇ . Of these 14 memory accesses, the first 12 are used to compute result words that are entirely unset and only the last two accesses lead to a non-zero result word.

As noticed by Baboescu and Varghese [31], these situations can occur frequently with a growing number of rules and dimensions. They argue that one sparse dimension is sufficient to lead to situations such as the one described above, and

V1 = [︂ 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 ]︂ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ w₁1 w₂1 w₃1 w₄1 w₅1 w₆1 w1₇ w₈1 V2 = [︂ 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 ]︂ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ w2 1 w22 w32 w42 w52 w62 w27 w82 ⋀︁ Vres = [︂ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 ]︂ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞

wres₁ wres₂ wres₃ wres₄ wres₅ wres₆ w₇res w₈res Fig. 4.4: Sparse bit vectors.

therefore propose the Aggregated Bit Vector Search (ABV), an enhanced variant of the original BV scheme [31]. The main idea of the ABV scheme is to use bit vector aggregation to avoid a large number of memory accesses to empty words. The usage of aggregated bit vectors allows to traverse sparse parts of the original search data structure more quickly, at the cost of a slightly increased memory requirement for the search data structure. To this end, for every bit vector V in the original BV algorithm, the ABV approach computes an additional vector Vagg

of length ⌈︁n a

⌉︁

in the preprocessing phase, where a is a predefined aggregation size. For an aggregated vector V_agg, each bit Vagg[i]is set iff at least one of the a

corresponding bits V [a · i] to V [a · (i + 1) − 1] is set, i. e.,

∀ i ∈ {︃ 1, . . . , ⌈︃_n a ⌉︃}︃ : V_agg[i] = min{a·(i+1)−1,n} ⋁︂ j=a·i V [j]. (4.7)

Figure 4.5 shows the aggregated vectors V1

agg and Vagg2 for the original vectors

V1and V2 _{with an aggregation size of a = 2. It can be seen that for every word}

j from the original vectors Vithat is aggregated, there exists one bit bij in the

corresponding aggregated vector Vi

agg, which is an important property for the

classification phase.

The classification phase of ABV is similar to the one of the Lucent scheme, with the main difference that for each dimension j, an aggregated vector Vaggj is retrieved

in addition to the original vector Vj_{. However, this time, the words u}j i of the

aggregated vectors Vaggj are bitwise ANDed instead of the original vectors, as

shown in Figure 4.5. As before, the goal is to find the first word in the aggregated result vector V_aggres with a set bit. The existence of such a bit bkat position k implies

that in each original vector Vj_{, the kth aggregated word w}j

k contains at least

one set bit. Therefore, as in the Lucent scheme, the word w_kres is computed by bitwise ANDing the words wj_kand subsequently inspected for a set bit. If such a

w1₁ w1₂ w1₃ w1₄ w1₅ w1₆ w1₇ w₈1 V1 = [︂ 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 ]︂ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ V_agg1 = [︂ 0 0 1 0 0 0 1 0 ]︂ b1 1 b12 b13 b14 b15 b16 b17 b18 w2₁ w2₂ w2₃ w2₄ w2₅ w2₆ w2₇ w₈2 V2 = [︂ 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 ]︂ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ V_agg2 = [︂ 0 0 1 0 0 0 1 1 ]︂ b2₁ b2₂ b2₃ b2₄ b2₅ b2₆ b2₇ b2₈ Fig. 4.5: Computing aggregated bit vectors.

V_agg1 = [︂ 0 0 1 0 0 0 1 0 ]︂ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ u1₁ u1₂ u1₃ u1₄ V_agg2 = [︂ 0 0 1 0 0 0 1 1 ]︂ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ u2₁ u2₂ u2₃ u2₄ ⋀︁ V_aggres = [︂ 0 0 1 0 0 0 1 0 ]︂ b1 b2 b3 b4 b5 b6 b7 b8 ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞ ⏞ ⏟⏟ ⏞

ures₁ ures₂ ures₃ ures₄

Fig. 4.6: ANDing aggregated bit vectors.

bit exists, the classification terminates, otherwise, the search in the aggregated vector continues. In the example in Figure 4.6, it takes a total of 12 memory accesses until the search terminates, it contrast to the 14 accesses of the Lucent scheme, since all eight words uj_i as well as w1

3, w23, w71and w72are accessed. Note

that a set bit in the aggregated result vector can lead to a false positive lookup of the corresponding words in the original vectors. For example, both words w1

3 and

w₃2in Figure 4.5 result in the aggregation bits b1

3 = b23 = 1. This, in turn, leads to

the inspection of w1

3 and w23, which do not have any common set bits.

The worst case performance of the ABV classification algorithm is not different from the BV worst case performance. Nevertheless, in practice, ABV can out- perform BV significantly due to the ability to quickly traverse gaps of unset bits. Furthermore, the authors of [31] suggest an additional enhancement to their ABV technique, which uses a rule sorting mechanism to group rules in a way that reduces the likelihood for false positive lookups. However, this requires the algorithm to compute all matching rules and not only the first one, as rule re-ordering violates rule prioritization and therefore typically the rule set’s semantics.

In document System-Specialized and Hybrid Approaches to Network Packet Classification (Page 53-55)