The bandwidth guarantee enforced by Cheetah only ensures the minimum level of service for each VD/VDC, but on a distributed storage system shared by multiple tenants, it is quite natural to see some VDs generating more workload that seeks a higher share of the DA than what it had requested for in its QoS specification. If the DA does not have spare bandwidth, then such a workload overloads the DA and can potentially bring down the performance of the entire DA. Though the CFVC scheduler in the DA ensures fairness and performance isolation to a fair degree, it is beyond the control of any disk scheduler to stop a workload from generating more requests. Since the hardware resources on a SN are limited, it is essential to regulate the data flow between CNs and SNs, so that a CN never overloads any SN. When a SN suggests some data flow rate to a CN, the CN has to again adopt another flow control algorithm to regulate the data flow from multiple VDs that send their data to multiple DAs on the given SN. Otherwise, a single VD that misuses its share, will negatively impact other VDs on that CN that share the same SN. Therefore, Cheetah proposes to regulate the data flow rate directly between DAs and VDs. An SN computes the ideal data flow rate for each of its DAs and forwards the flow control suggestions to the corresponding CNs that hold the corresponding VDs.
To handle flow control management between VDs and DAs, it is neces- sary to accurately measure the residual bandwidth on each DA in the storage system and only then the SNs can suggest a suitable data flow rate to the corresponding VDs. However, due to advanced caching and NCQ techniques on modern disk drives, it is extremely challenging to measure the DRUT ca- pacity of a VD, which was described in greater detail earlier in Section 7.3. Given the DRUT capacity for each VD on a DA, Cheetah uses the following flow control algorithm on each of the DAs:
1. Measures the total disk time usage Y, and the disk time usage Yi and the incoming I/O rate Zi of each V Di that imposes load on that DA, and
2. Sends to V Di an advised I/O rate, which is equal to Yi Y ∗ Ylimit∗ 1 Yi ∗ Zi = Ylimit Y ∗ Zi
Ylimit is each DA’s maximum allowed disk time usage. If Y exceeds Ylimit, flow control is triggered. The advised I/O rate given by a DA to a VD sets an upper bound on the I/O rate of that VD to that DA.
Ylimitis configured as a percentage of the time period for which the statisti- cal measurements are made on the DA. Very low value of Ylimitresults in under utilization of disk resources and a very high value results in overloaded condi- tions, that seriously disrupts the overall performance of the system. Therefore, as a safe heuristic, Cheetah configures Ylimit as 70% of the total observation time. In the formula used in step 2, when Y exceeds Ylimit, for each V Di, Zi should be lowered by a factor:
required disk usage time observed disk usage time
where, required disk usage time should be a factor of Ylimit rather than Y, and hence the factor Yi
Y ∗ Ylimit.
In the above mentioned flow control algorithm, all VDs are treated equally and hence is QoS unaware. The required disk usage time component in the above mentioned QoS unaware flow control algorithm, should ideally involve QoS reservations and hence Cheetah proposes the following QoS aware flow control algorithm, and does the following on each DA:
• Measures the advised I/O rate to be QoSi QoSsum ∗ Ylimit∗ 1 Yi ∗ Zi
QoSi is the bandwidth requirement of V Di on the DA and QoSsum is the sum of all QoS requirements on that DA.
In case of high fluctuations in input workload pattern, the observed disk usage time gives a tighter control over the flow control regulation rather than the QoS aware formula, and hence Cheetah computes the final advised I/O rate as the minimum of,
QoSi QoSsum ∗ Ylimit∗ 1 Yi ∗ Zi , Ylimit Y ∗ Zi
DA PB : 100 R1 / 80 R2 / 50 R3 / 30 VD1 40 VD2 20 VD3 10 Guaranteed PB
Generated Rate/ Expected PB
Advised Flow Control Rate Regulated Rate: FC1
Regulated Rate : FC2
Regulated Rate: FC3
Figure 7.5: Illustration of flow control management on a DA shared by 3 VDs. PB is in units of MBPS
Figure 7.5 illustrates an example of flow control management on a DA shared by 3 VDs. The PB guarantee for VD1, VD2 and VD3 are 40 MBPS, 20 MBPS and 10 MBPS respectively. The DA with 100 MBPS PB capacity is ideally configured to be 70% utilized and is reflected by the sum of the guaranteed bandwidth for each VD on the DA, which is 70 MBPS. Lets assume VD1, VD2 and VD3 generates its requests at a rate of R1, R2 and R3 I/O requests/second respectively. R1, R2 and R3 corresponds to hypothetical PB value of 80 MBPS, 50 MBPS and 30 MBPS respectively. The expected PB bandwidth is shown to give a better clarity on the load exerted by each VD on the DA. Since the sum of the expected PB values from all the VDs (160 MBPS) is higher than the desired PB utilization capacity of the DA (70% of 100 MBPS), the CFVC scheduler on the DA proportionally allocates the DA bandwidth to each of the VDs and hence VD1, VD2 and VD3 receives a PB share of 57, 29 and 14 MBPS respectively. Cheetah then triggers flow control mechanism to regulate the data flow from each VD to the DA. In the QoS unaware technique, each VD is asked to reduce its request generation rate by a factor equivalent to Ri
R1+R2+R3, where Ri corresponds to R1, R2 or R3 for VD1, VD2 or VD3 respectively. However, the request generation rate by itself doesn’t represent the entire workload on a VD. Hence Cheetah uses the DRUT capacity of a VD to determine the factor by which the VDs request generation rate should be reduced. The DRUT capacity of a VD effectively captures
all the necessary locality information of a VD’s workload and corresponds to the PB value allocated by the CFVC scheduler. Similarly, in the QoS aware technique, Cheetah uses the guaranteed PB value to determine the factor by which the request generation rate of a VD should be reduced.