A simple thought experiment - Exploring the impact of Trevi

4.7 Exploring the impact of Trevi

4.7.1 A simple thought experiment

As a trivial starting point I constructed the following simple thought experiment. Picture a network where traffic is a mix of long-running storage flows and shorter foreground flows consist of short messages and longer responses. This is a reasonable assumption according to the literature on data centre traffic. In the following I will attempt to gauge at what point it becomes efficient to use a Trevi-style transport for the storage traffic. For now I will assume that it is a unicast variant of Trevi as this will make the comparison

with TCP easier. Further I assume that Trevi requires a fixed overhead of 10%. My final simplifying assumption is that Trevi traffic is given extremely low priority at queues. I start by contrasting two extreme cases—a lightly loaded network where there is plenty of spare capacity on average and congestion only occurs as a result of TCP’s congestion control and a heavily loaded network where there is barely any spare capacity.

Light Load In a lightly loaded network, if all the traffic runs TCP then the storage flows will quickly grow their congestion windows and will take a large share of the network until they finish. This is a simple consequence of the fact that TCP is optimised to favour longer-running flows. During this time any foreground flows will see an increase in delay due to queues building in the network. Once the storage flows finish then the queues will go and the foreground flows will complete much faster. In other words the likely outcome will be a large variability in flow completion time for foreground flows. This will potentially be exacerbated by incast.

If the storage traffic uses Trevi then this will change. Trevi traffic is preferentially dropped at queues. Therefore it will not impact foreground traffic. Equally, because there is relatively little foreground traffic it should be easy for sufficient Trevi packets to reach their destination and thus the Trevi storage flows should complete in a similar time to TCP. This suggests that in such a network using Trevi will lead to a significant performance improvement. However a lightly loaded network is an inefficient use of resources. It is hardly news that by reducing the load in a network you can significantly improve its performance—at a low-enough load, even TCP will perform extremely well for foreground traffic.

Heavy Load Now consider a network that is heavily loaded such that it is suffering frequent packet drops. If all the traffic runs TCP then the congestion in the network will lead to frequent packet drops and retransmissions. This will affect both the storage and foreground flows. For the storage flows it will act to limit the size of congestion window they can achieve and will increase the flow completion time. For foreground flows it will lead to even greater variation in flow completion time. If the storage traffic uses Trevi then this will be preferentially dropped at the queues. In turn this will mean that the relative congestion seen by the foreground traffic will go down. This will have the positive effect of reducing the variability of flow completion times and improving the overall average. However, if too much Trevi traffic is being dropped then at some point you will reach a stage where insufficient packets get through to allow the Trevi flows to complete.

In other words when the load in the network is too high neither TCP alone nor TCP combined with Trevi is able to work effectively. Again, this is not really surprising - if you picture the network as a time and space switch then as you approach capacity you run out of free slots to move flows into. The result is that eventually you end up with congestion collapse and all flows suffer.

Table 4.1: Comparing the impact of increasing the ratio of Trevi traffic Storage Traffic Ratio 0.7 0.7 0.8 0.8 0.9 0.9 Trevi Overhead 5% 10% 5% 10% 5% 10% Avail. Foreground Capacity (MB) 265 230 160 120 55 10

The implication is that there must be a sweet spot where the network is able to run efficiently (at reasonable load) but with Trevi traffic still able to get sufficient bytes through to allow flows to complete. The following is a simple attempt to calculate where this sweet spot might lie.

Trevi traffic is preferentially dropped at queues. Consequently any time the network becomes congested Trevi will have to send significant amounts of extra traffic. As a rule of thumb a network running TCP starts to degrade rapidly once congestion approaches 10% (e.g. once more than 1 in 10 packets are being dropped). So to find the ideal spot we need to look at how many TCP drops we can trade for Trevi drops before Trevi stops working.

Trevi breaks down once it is no longer able to get enough bytes to the receiver to decode the data. Assume that in a data centre S% of bytes belong to storage flows. If Trevi needs an overhead of δ, then in the Trevi case you need S + (S.δ) bytes to reach the receiver. In a fully loaded network with capacity C this equates to:

(S + Sδ).C bytes (4.1) In turn that means if you lose more than:

(1 − (S + Sδ))C bytes (4.2) then Trevi no longer functions. Table 4.1 seeks to put this in context (this assumes a 1GB network link).

So this simple calculation shows that the Trevi overhead has a big impact on how much foreground traffic can be supported. Taking potentially realistic figures of 90% storage traffic and 10% overhead you would only be able to send 0.1% foreground traffic before the storage traffic suffers. Rearranging 4.2 we can see that with 10% Trevi overhead your storage traffic cannot exceed 82% network capacity.

This thought experiment suggests that Trevi might be useful in networks where the utili- sation is such that TCP would start to trigger too much congestion. Of course, even with the low congestion that the preferential dropping of Trevi packets gives, there will still be queues building at network switches which will have an impact on TCP throughput.

In document Optimising data centre operation by removing the transport bottleneck (Page 73-76)