Header Applications - Performance Limitations of Dynamic Predictors

6.4 Performance Limitations of Dynamic Predictors

6.4.2 Header Applications

For header applications there are a number of possible variables which can directly effect prediction performance. While some of these variables can be safely ignored, e.g. the percentage of corrupted packets traversing any IP network, the more important question is how prediction rates change over time.

6.4.2.1 Forwarding Applications

For IP forwarding applications, such as TRIE and HASH, the forwarding table represents the most dynamic source within the application. Changing as new networks and routes are added, it is difficult to isolate the routing table from the underlying network topography.

In the case of network simulations, the lack of availability of real-world network traces requires shared repositories such as NLANR [164] to be used. For the SimNP simulations, the anonymised addresses are replaced with addresses derived from the destination addresses referenced in common routing tables. Regardless of the forwarding structure utilised, the prediction rate is more likely to be changed by the underlying data rather than the absolute number of entries. To examine this, routing entries from the MAEWEST and AT&T East Canada routing table are parsed to form a new synthetic routing table. The results in Table 6.3 demonstrate that although prediction rate will change as the routing table is altered, the difference in performance is relatively small, 1.45% for TRIE-based forwarding and 1.03% for HASH based forwarding.

Table 6.3: Gshare Prediction Hit Rate For TRIE and HASH Forwarding

TRIE HASH

Routing Entries Hit Rate % Routing Entries Hit Rate %

75,000 93.72 5,000 96.03

102,000 92.33 10,000 95.66

119,000 92.96 15,000 95.19

141,000 92.27 20,000 96.22

6.4.2.2 Classification Applications

For packet classification algorithms, the prediction rate is determined by both the ruleset entries and the structure used to represent the ruleset. In the case of RFC, the data structure requires no conditional operations during the rule lookup, examining the memory structure in the same fashion regardless of the underlying data. In Table 6.4, the perfor- mance of a 256-Entry gshare predictor is outlined as the number of classifier rules stored is increased. For this simulation a 1000 rule classbench [176] defined ruleset was used. Similar to the forwarding algorithms, the hit rate does not appear to either increase or decrease with the provision of additional rules. The prediction rate can change by up to 2.58% between one classification set and another, highlighting some of the variance in dynamic predictor performance.

Table 6.4: Prediction Hit Rate For Hypercuts Classification Rule Entries Hit Rate %

250 86.52

500 88.98

750 86.44

1000 86.40

6.4.2.3 Metering & Queueing Applications

The final application types examined in detail are the metering and queueing applications such as Three Colour Metering (TCM) or Deficit Round Robin (DRR). As described pre- viously, metering algorithms such as either Single-Rate TCM, Two-Rate TCM, Leaky Bucket or Token Bucket typically operate by regulating the packet output in order to match a bucket which is configured to fill with tokens at a given fill rate. In the case of TCM, two buckets are used during normal operation; command and peak buckets. Both buckets are configured to fill at different rates, allowing a greater degree of granularity to be employed during metering. A sample configuration might be for the command bucket to be used to detect a large number of packets arriving within a short amount of time, while the peak bucket can be used to detect when a high number of large packets arrive within a short amount of time. In this case, packets falling into the command bucket are marked green, packets falling into the peak bucket are marked yellow, otherwise (low network load) packets are marked red. To examine predictor performance for various network conditions, the same 100,000 packets from the OC-48 trace are metered for various configurations of the fill rates. The results are summarised in Table 6.5, with the prediction rate varying between 98.1% for periods of time where network load is relatively low (À Red) and 93.13% when a high proportion of the packets exceed the peak and command fill rates.

The final application examined is the deficit round robin queueing algorithm. Similar to other queueing systems, there are three variables within the algorithm which can be identified as possibly altering prediction rates; the number of unbalanced input queues (Nip), the number of output queues, (Nop), and the quantum associated with each round

Table 6.5: Prediction Hit Rate For TCM Metering Red % Green % Yellow % Hit Rate %

0.20 4.34 95.46 98.1 22.54 4.34 73.13 97.71 50.41 49.59 0 95.83 20.31 54.39 25.30 94.7 22.54 65.51 11.95 94.98 50.42 40.20 9.38 94.34 44.63 35.45 19.91 93.13

(Qrr). The quantum within the DRR algorithm refers to how many bytes are moved

from the input to the output during each round of the algorithm, so that, for example, if the current packet at input is 500 bytes long, the current queue quantum is 300 and the quantum added per round is 100, the packet must wait 2 iterations before being moved to the balanced output queue. Using the OC-12 packet trace, the prediction rate for a 256-Entry gshare predictor is shown in Figure 6.6. In Figure 6.6(A) the prediction rate is shown for a varying number of input and output queues. As can be seen, in both cases the prediction rate increases as the number of queues is balanced before falling almost 3% as the number of configured queues (either input or output) exceeds the number of fixed queues (either input (Nip) or output (Nop). While it is clear that the relationship between

the number of input/output queues will affect the prediction rate, the quantum size has no definitive relationship to the hit rate. For configurations involving a large number of queues, the prediction rate changes by approximately 1% as the quantum is increased from 100 to 1200. A quantum of 1200 would allow nearly all packets through within a single iteration, minimising the ability of the algorithm to balance the output queues.

In document Branch Prediction For Network Processors (Page 154-157)