• No results found

8.2 NUMA Design

8.3.2 Bloom Filter Implementation

The Parallel Bloom Filters algorithm can handle thousands of patterns but it uses a Bloom Filter for each possible pattern length. A Bloom Filter is a data structure that stores a set of signatures compactly by computing multiple hash functions on each member of the set. Each Bloom Filter computes k hash functions of each pattern in its set and produces k hash values ranging from 1 to its corresponding patterns length, m. It sets the k bits in a m bits vector. It repeats this process for each pattern in its set. Each Bloom Filter scans a substring of its corresponding length of the streaming data and detects suspicious signatures. If all the k hash functions give the same values as some of the patterns, the Bloom Filter declares this pattern as suspicious and the analyzer determines if the string is indeed a member of the set or a false positive. Multiple engines can be instantiated to monitor the data, thus the byte stream can be advanced by more than one byte at a time.

NUMA searches for the deepest path in the tree for a givenURLpath so a longest prefix match- ing algorithm is needed. Bloom Filters are typically used for efficient exact match searches. In [98], the authors present a Bloom Filters based algorithm for longest prefix matching. The purpose of the presented algorithm is IP lookup but since NUMA based on the same logic, we can use it as an efficient implementation for NUMA. The approach presented in the paper begins by sorting the forwarding table entries by prefix length, associating a Bloom Filter with each unique prefix length, and “programming” each Bloom Filter with prefixes of its associated length. A search begins by performing parallel membership queries to the Bloom Filters by using the appropriate segments of the input IP address. The result of this step is a vector of matching prefix lengths, some of which may be false matches. Hash tables corresponding to each prefix length are probed in the order of longest match in the vector to shortest match in the vector, terminating when a match is found or all of the lengths represented in the vector are searched. NUMA can adopt the algorithm by replacing prefixes with paths and IP address segments with sub-paths.

The algorithm performance is determined by the number of dependent memory accesses per lookup, and it is held constant without any dependence in the paths lengths. The total number of expected hash probes per lookup for any incomingURL path is

8.3. OPTIMAL IMPLEMENTATION 80 CHAPTER 8. NUMA

Where B is the number of Bloom Filters and f is the false positive probability. Let M be the total amount of embedded memory available for Bloom Filters and N be the target amount of paths supported by the system, the false positive probability is

f = (1

2)

(MN)ln2 (8.2)

The expected number of hash probes per lookup depends only on the total amount of memory resources, M , and the total number of supported paths, N . Note that there is the possibility that the input creates false positive matches in all the filters in the system. In this case, the number of required hash probes is

Eworst = B + 1 (8.3)

In the paper, they reveal that the distribution of prefixes is not uniform over the set of prefix lengths. In addition, routing protocols also distribute periodic updates; hence, forwarding tables are not static. These two observations are true also for HTTP traffic. The distribution of URL paths is not uniform as shown in Figure 8.2(c) and the data set might be updated on the arrival of HTTP request. Therefore, as the authors conclude in the paper, NUMA will also benefit of using asymmetric Bloom Filters. In asymmetric Bloom Filters the amount of embedded memory is proportionally allocate to each filter based on its current share of the total paths while adjusting the number of hash functions to maintain minimal false positive probability.

As shown in Figure 8.2(b), by using 5 Bloom Filters, only 7% of the path-components should be examined on regular memory. Using Equation 8.1, the expected number of hash probes per lookup, Eexp, may be expressed as

Eexp = 5× (

1 2)

(M ˙Nln2)

+ 1 (8.4)

Figure8.3presents the expected number of hash probes per lookup, Eexp, versus total embed-

ded memory size M for various values of N . Our input contains 130000 paths, so with 2 Mb, the expected number of hash probes per lookup is around 1.003. The access latency of a commod- ity SRAM device is 0.5 ns, i.e. 2000 million lookups per second. This figure corresponds to a lookup rate of about 1994 million lookups per second. Using equation8.3, the worst case number

CHAPTER 8. NUMA 81 8.3. OPTIMAL IMPLEMENTATION

of dependent memory accesses is 6 which corresponds to a lookup rate of about 333.3 million lookups per second. Lastly, we have to evaluate the processing of the path-components that have no representation in the Bloom Filter. In our simulation, only 7% of the path-components have to be examined on SRAM. This examination does not add processing time since it can be done in a pipeline with the Bloom Filter processing.

1 1.5 2 2.5 3 3.5 4 4.5 5 x 106 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

Size of embedded memory (MBits)

Expected # of hash probes per lookup

100000 paths 130000 paths 150000 paths 200000 paths 250000 paths

Figure 8.3: Size of embedded memory (MBits)

0 200 400 600 800 1000 1200 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 TCAM Size (KB)

Avergae Num of Insertions for 10,000 Requests

Figure 8.4: TCAM: Insertions vs Size

In order to control N and f , we can use counting Bloom Filters, as first introduced in [53]. When the cleaning process needs to delete a path from a Bloom Filter, it subtracts the counters of the path components corresponding entries. If the some counters become zero, the corresponding bit in the vector is unset.