Analysis - Our Approach - Algorithms and Architectures for Network Search Processors

3.3 Our Approach

3.3.2 Analysis

We can show the relationship between false positive probability of Bloom filters and its effect on the throughput of the system. We measure the throughput of the system as a function of the average number of memory accesses per lookup. As described in the LPM algorithm, we perform a hash table probe for prefix i only if Bloom filter i shows a match. The following notations will be used:

• W : number of Bloom filters in the system; we assume it to be equal to the length of the IP address.

• ni : number of prefixes of length i in the table (items in Bloom filter i)

• N : total number of prefixes in the table, ^Pni for all i

• m_i : number of bits allocated to Bloom filter i

• M : total on-chip memory available to construct Bloom filters, ^Pmi for all i

• fi : false positive probability of Bloom filter i

• ki : number of hash functions used in Bloom filter i

• p_i : probability that the prefix of length i being inspected indeed belongs to the set of prefixes of length i (in other words successful search probability of hash table i or true positive probability of Bloom filter i)

• ts : average memory accesses required for a successful search in any hash table

• tu : average memory accesses required for an unsuccessful search in any hash table

• T_i : cumulative memory accesses required from Bloom filter i down to Bloom filter 1

We can express the cumulative memory accesses Ti from Bloom filter i using the recursion:

Ti = pits+ (1 − pi)(fitu+ Ti−1) for i = W downto 1 (3.1)

This equation essentially illustrates that starting from i^th Bloom filter, we execute a successful search in a hash table with probability piand the memory accesses required are pits. Otherwise, with probability (1 − pi), we get a false positive match requiring f_it_u memory accesses. In case of a false positive, we proceed to the next filter result and repeat the procedure which will result in accesses Ti−1. This equation can be simplified by making certain assumptions. To begin, we assume that ts = tu = 1.

This can be achieved in a carefully constructed hash table. Hence,

Ti = pi+ (1 − pi)(fi+ Ti−1) for i = W downto 1 (3.2)

It is difficult to assume the true match probabilities for any particular prefix length.

However, by profiling and observing, we can determine the values of pi for each prefix length i. An interesting question arises: Given p_i, what values of f_i would minimize T_i subject to the constraint ^Pm_i ≤ M? Intuitively, if the prefixes of length i are the longest matching prefixes most often (i.e. high pi), then our search ends at the i^th Bloom filter and it is acceptable to have Bloom filters with large false positive probability beyond filter i since they are rarely used. Therefore, we can allocate less

31 memory to those Bloom filters. On the other hand, since the search ends at i^th filter, we should not have any false memory accesses and can allocate more memory to the Bloom filters before the i^th to reduce the false positive probability. Therefore, given the true match probabilities of the filters, a fixed amount of memory can be allocated optimally to the filters to reduce overall memory accesses. We have discussed this trade-off since it can be useful when true match probabilities are known. However, we do not explore it for the IP lookup application since no assumptions can be made regarding true match probabilities.

To simplify the equation further, we prove the following theorem.

Theorem 1: Ti ≤^Pⁱ_j=2fj+ T1 for i ≥ 2

proof: The proof is by induction on i using Equation 3.2.

For i = 2,

T₂ = p2+ (1 − p2)(f2+ T1) ≤ f2+ T1

For i = 3,

T₃ = p₃+ (1 − p₃)(f₃+ T₂) ≤ p₃+ (1 − p₃)(f₃+ f₂+ T₁) ≤ f₃+ f₂+ T₁ Now assume that the result holds for j,

Tj ≤

l=2

fl+ T1

Hence,

T_j+1 = p_j+1+ (1 − p_j+1)(f_j+1+ T_j) ≤ p_j+1+ (1 − p_j+1)(f_j+1+ (

l=2

f_l+ T₁))

≤ (

j+1

l=2

f_l+ T1)

Hence the proof. •

This theorem gives a pessimistic average bound on the performance. It essentially illustrates that the cumulative average search time required from the i^thfilter onwards is the worst when there is no true match in the first B − 1 filters and memory accesses must be executed due to the false positives of these filters each with probability fi. Furthermore, since T1 ≤ 1, we get the following equation.

TW ≤

j=2

fj+ 1 (3.3)

In order to simplify this analysis, we assume that all Bloom filters are optimally tuned and obey the equation

This assumption is optimistic since the value of ki can be a fraction. However, in practice, ki is always an integer and achieving optimum is not always possible. Given a mi and ni, ki can be calculated with this equation and the value can be rounded to the nearest integer; larger or smaller depending on which results in a lower false positive probability for the filter. The resulting configuration does not deviate much from the optimal configuration.

Furthermore, we tune all the optimal Bloom filters to exhibit the same false positive rate, f . Therefore, all of the Bloom filters use the same number of hash functions.

f_i = f =

In other words, each Bloom filter is allocated a memory segment from the available M bits that is proportional to its share of prefixes in the total. Ideally, it is possible

33 to allocate memory in this fashion but in practice it is difficult. Usually embed-ded memory is allocated in blocks of bits as opposed to a single bit. However, by slightly over-allocating the memory to meet the block size requirement, nearly the same performance can be maintained.

With this assumption, the false positive probability fi for a given filter i may be expressed as

fi = f =

1 2

(^MN)^{ln 2}

(3.8)

Let τavg1 denote the average number of memory accesses per address lookup for this basic configuration (1 in the subscript is for the first scheme). Finally, from Equa-tion 3.3,

τavg1 = TW = (W − 1)f + 1 = 31

1 2

(^MN)^{ln 2}

+ 1 (3.9)

With a moderately large value of M ln 2/N, the factor 31¹₂(^MN)^{ln 2}

becomes very small when compared to 1. The average number of memory accesses approaches 1. While the average performance of our algorithm is appealing, the worst case performance is poor. Each Bloom filter could then show a false match and force the corresponding memory access to be performed. We denote the worst case memory accesses for this configuration as τworst1. Hence,

τ_worst1= W = 32 (3.10)

We now plot the average memory accesses as a function of the available on-chip mem-ory M for different values of table sizes, N. The performance is shown in Figure 3.4.

Figure 3.4 shows that with more memory, the false positive probabilities of Bloom filters decrease exponentially. As a result, the average number of memory accesses per lookup decrease exponentially. With 2MB of on-chip memory, we can support 250,000 prefixes with each lookup requiring less than two memory accesses. If implemented using a commodity SRAM chip operating at 333 MHz frequency, then this system can

1 1.5 2 2.5

In document Algorithms and Architectures for Network Search Processors (Page 47-52)