Effectiveness of Load Balancing Routing - Cloud-Scale Data Center Network Architecture. Cheng-C

We used a simulation approach to evaluating the effectiveness of Peregrine’s load-balancing routing algorithm. The test network being simulated spans 52 physical machines with 384 links. Each physical machine is connected to four TOR switches via a separate 1GE NIC, and each TOR switch in turn is connected to four regional switches via a separate 10GE link. To derive realistic input network traffic loads, we started with the packet traces collected from the Lawrence Berkeley National Lab campus network [18]. Each packet trace spans over a period of 300 to 1800 seconds from different subnets with a total of around 9000 end hosts. We assumed each packet trace represents a VM-to-VM traffic matrix in a virtual data center, and the VMs are assigned to PMs in a random fashion. Because the ITRI container computer is designed to support multiple virtual data centers running concurrently on it, we created multiple multi-VDC traffic matrixes, each of which is constructed by randomly combining five VM- to-VM traffic matrixes into one traffic matrix. Totally 17 multi-VDC 300-second traces were created and replayed on the simulated network.

Given a multi-VDC packet trace, we used the first half of the trace to derive its traffic matrix, compute routes for communicating physical machines, and replayed the second half of the trace on the simulated network using the result- ing routes. The metric used to measure the effectiveness of routing algorithms is the congestion count,Nc, during the trace replay period. For every second

5 10 15 10000 20000 30000 40000 Trial Number Congestion Count ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 6 7 8 9 11 12 13 14 16 17 1% 2% 3% 4% Additional tr

affic load percentage

● RSPR FLCR

% of additional traffic

Figure 5.4: Congestion count (left Y axis) and additional traffic load (right Y axis) comparison between the full link criticality-based routing algorithm and the random shortest path routing algorithm using multiple multi-VDC packet traces as inputs 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9

% of host pairs accounting for 90% of total traffic

Nc−FLCR Nc−RSPR ● ● ● ● ● ● ● ● ● _Z=100% Z=90%

Figure 5.5: Congestion count ratio between full link criticality-based routing and random shortest-path routing under different degrees of skewedness in the input traffic matrix

of the input trace, we placed the load of every communicating host pair during that second on the links along the pair’s route in the simulated network. During this replay process, whenever a host pair’s load is placed on a link whose ca- pacity (Mbits/sec) is already exceeded, the congestion count is incremented by one. Figure 5.4 compares the congestion counts of the full link criticality-based routing (FLCR) algorithm, which is load aware, and the random shortest-path routing (RSPR) algorithm, which is load-insensitive, using as inputs the 17 multi-VDC input packet traces described above. These two algorithms repre- sent the two extremes ofPeregrine’s routing algorithm (Section III.D): FLCR corresponds to when Z is set to 100, whereas RSPR corresponds to when Z is set to 0.

As expected, FLCR out-performs RSPR in all 17 traces, because the former strives to avoid congested links through the guidance of link criticality and expected link load. In contrast, RSPR only relies on randomization to avoid congestion and is thus less effective. The price that FLCR pays for avoiding congestion is the paths it produces tend to be longer and have a larger hop count that those produced by RSPR. As a result, the total traffic load injected by FLCR tends to be higher than that injected by RSPR. Fortunately, the percentage of additional traffic load due to longer paths is insignificant, around 0.5

To explain why the effectiveness difference between FLCR and RSPR varies with the input traces, we measured theconcentration percentageof each input trace, which is the percentage of the top heavy-traffic host pairs that account for 90% of the total traffic volume in the input trace, and correlated this percentage with the routing effectiveness difference, as represented by the ratio of congestion counts (Nc) of FLCR and RSPR, for all 17 input traces. As shown by the solid curve in Figure 5.5, when an input trace has a lower concentation percentage, the congestion count ratio tends to be lower, indicating that the routing effectiveness between FLCR and RSPR is greater. This is because a lower concentration percentage means a higher degree of skewedness in the input workload, and the advantage of FLCR over RSPR is more pronounced when the input load is more skewed.

The complexity of full link criticality-based routing is O(L∗P), where L represents number of physical network links and P is number of PM pairs. From the multi-VDC traces, we found that most of the entries in their traffic matrices are insignificantly small, e.g., the traffic loads of fewer than 5% of the host pairs account for more than 90% of the total traffic volume, as shown in Figure 5.5. The solid curve corresponds to FLCR (Z=100), whereas the dotted curve corresponds to the case when applying link criticality-based routing only to top heavy-traffic host pairs that are responsible for 90% of the total traffic volume, i.e. Z = 90. The difference between these two curves is very small, indicating that the two configurations have similar routing effectiveness, although the Z=90 case requires much less route computation time than the Z=100 case. More concretely, the number of host pairs in a 500-server network to which link criticality-based routing is applied is reduced from 250K when Z=100 to 12.5K when Z=90. In our current implementation, the route computation

time for 12.5K host pairs takes about 10 minutes.

In document Cloud-Scale Data Center Network Architecture. Cheng-Chun Tu Advisor: Tzi-cker Chiueh (Page 45-48)