• No results found

5.4 Case Study: Integration with Cooling System

5.4.2 Free-cooling Modeling

We used the model described in 5.4.1 to evaluate the impact of the environmen- tal conditions, the workload and the chiller free-cooling capability on the cooling power consumptions for Eurora supercomputer, whose thermal parameters and constraints are reported in Table 5.1. A rather exhaustive set of operating con- ditions have been tested; the thermal power Pthranged from the idle condition

Pth= 5.5kW to the HPC maximum power Qmax ≈ 40kW ; several intermedi-

ate ambient temperatures (Tamb = {0, 5, 10, 15, 20, 25, 30, 35, 40}◦C) were also

taken into account, covering the seasonal average temperature variations. In addition, the cooling management strategy actually implemented in Eurora has been assumed: the liquid flow rate, the valve position and the chiller opera- tion are regulated to keep the inlet water temperature lower than 25◦C (but

higher than 18◦ C to avoid condensation), and a bounded temperature gradient

Tout− Tin<5◦C.

Figures 5.19a and 5.19b show the obtained cooling power consumption and the corresponding PUE, respectively. It can be clearly noticed the effect of

8Defined as (Pcool+Pth)

5.4 Case Study: Integration with Cooling System 135 Parameter Value RHE1 0.7 [mK/W] RHE2 0.6 [mK/W] CHE1 225 [kJ/K] CHE2 450 [kJ/K] CHP C 150 [kJ/K] TinM IN 18 [◦C] TinM AX 25 [◦C] ToutM IN 18 [◦C] THP Cmax 85 [◦C] qmax 12 [m3/h] cv 4186 [J/(Kg× K)] ρ 1000 [Kg/m3]

Table 5.1: Eurora Cooling System Parameters

both the environmental conditions and the workload (Fig. 5.19c); the kinks in the plots for Tamb 0 − 20◦C denote the point where the free cooling condition

is no longer feasible and the chiller thermodynamic cycle is activated – before that points only the pump power is accounted. At Tamb = 0◦ the system is

able to operate in free-cooling for all the Pthrange, while for Tamb>25◦Cfree-

cooling is unfeasible due to constraint on TinM AX<25◦, thus an almost linearly

increasing PUE is obtained as the workload rises. We can further notice that the PUE increases with the workload also for ambient temperatures allowing free-cooling, denoting an increased pump power consumption due to flow rate increase. This is a system-specific feature stemming from Eurora cooling system sizing and the corresponding control strategy.

These information can be the input for the power-capped dispatcher dis- cussed in previous sections. The online scheduler could therefore use a varying power budget, based on the current environmental and workload request sce- nario, with the goal of optimizing the whole supercomputer power consumption, i.e. cooling infrastructure included. In other words, given an “efficiency thresh- old”, namely the desired PUE level not be exceeded, the corresponding Pth

maximum admissible value can be determined. Thus, the HPC computational power can be scheduled/shaped so that the system will operate within these lim- its, as much as possible, according to other performance constraints. Fig. 5.19c shows the maximum thermal power (directly related to the power budget) which can be removed by the Eurora cooling mechanism, for different required PUE upper bounds P U Elim, and environmental conditions.

In order to obtain better resolution, the model data have been interpolated over the 2014 annual temperatures in Zola Predosa, Bologna, the closest weather station to Cineca, where Eurora is hosted. As expected, the maximum power budget is indirectly proportional to the ambient temperature (due to the chiller COP) and directly proportional to the PUE limit. However, corners can be noted again. Beside the obvious transition from full power (40kW ) to linear decreasing to keep the desired PUE, other jumps occur mostly in the range 20-25◦C. Again, the free-cooling is responsible for this behaviour. For lower

has to be electrically operated to meet thermal constraints, augmenting the cooling consumption and requiring a power shaping if a constant PUE has to be guaranteed.

(a) Power consumption

(b) PUE

(c) Maximum power budget

Figure 5.19: Overall cooling system power consumption (a) and corresponding PUE (b) for different Tamb and HPC workloads producing thermal power Pth.

Maximum power budget as a function of Ta and PUE upper bounds (c).

5.4.3

Experimental Results

In this section we are going to explore the integration of the power capped job dispatcher with the cooling system model presented in Section 5.4.1 and

5.4 Case Study: Integration with Cooling System 137

Figure 5.20: Idle Power and Active Power Ratio VS Power Budget (%)

5.4.2. More in particular, we conducted experiments using the hybrid approach (see Section 5.2.3). We are going to see that applying power capping to a supercomputer featuring free-cooling according to the ambient temperature can lead to substantial energy savings. We will also notice that power capping has also the side effect of increasing idle energy cost against active power. Idle energy or idle power is the amount of energy/power consumed by computing resources when no application is using them; sadly, idle power consumption in modern computing units is still greater than zero. Thus, future green supercomputers need to develop integrated power capping and power management mechanisms in order to switch off unused resources when not required.

We evaluate the performance obtained by the proposed dispatcher in case of varying power cap levels. As we have seen in previous sections (Sec. 5.2.4 and Sec. 5.3.3) if we decrease the power budget the overall performance in terms of average queue times decreases. This happens because if we impose tighter power constraints fewer tasks can be executed concurrently, therefore some jobs must be postponed and hence forced to wait longer.

We also have tested how the introduction of a power cap influences the power efficiency of the supercomputer. We used the same experimental setup described in Sec. 5.2.4. Results are shown in Figure 5.20. In the x-axis we see the power capping budget imposed, expressed as a percentage. A power budget of 100% means that the power constraint is more relaxed, while when we decrease the power budget (here down to the 20% of the maximal value) the power capping gets tighter. The maximal power budget was computed as the sum of all machine components maximum power consumptions (Thermal Design Power, TDP). In the y-axis we plot the ratio between the idle and active power consumed by the machine when executing all the scheduled jobs. The figure shows that if we reduce the power budget too much (down to the 50% of the maximal value) the percentage of power spent by idle components of the machine become a very relevant part of the total energy - if we reduce the power

Pue Budget[KW] Idle/active EffectivePue+Idle WQTloss

1.1 39.14 2.00% 1.100 0.0%

1.075 33.29 2.04% 1.075 0.7%

1.05 29.39 2.62% 1.057 8.6%

Table 5.2: Impact of the proposed power budgeting ambient temperature-aware on one year supercomputer center usage scenario.

budget to 20%, the idle power amounts to the 70% of the total power, i.e. a very inefficient power consumption. This is due to the fact that with smaller power budgets fewer jobs can execute in parallel, then more systems components are not used; since the unused components still consume some energy the overall idle energy increases.

These results are mainly due to the fact that in our experiments we did not consider the opportunity to switch off unused nodes. This is an interesting result as it clearly states that power capping solutions need to be integrated with idle resources shutdown and power management schemes in order to deliver the expected power saving. If this is not done the power capping risks to increase the idle power percentage as it works reducing the number of resources active at the same time. Constraint programming can be used to consider also the set-up and shut-down time for the idle resources; nevertheless such a problem was not part of the research discussed in this work.

We finally want to give an insight about a practical usage of the methodology proposed in this section, i.e. the integration between a power capped job dis- patcher and the cooling system model. We collected the average hourly ambient temperature for the entire year 2014 from the ARPA ambient station of Zola Predosa9, close by to the Cineca supercomputing center. We then forced three

PUE targets (1.1, 1.075, 1.05) and we computed with our cooling model the maximum power budget which ensures the target PUE for each hourly ambient temperature during the year. According to the hourly power budget we com- puted the hourly Quality-of-Service loss, measured as normalized average queue time (WQTloss), and idle over active power percentage. We finally combined this value with the target PUE to compute an effective PUE embedding the energy efficiency loss due to the increased idle power percentage. In Table-5.2 we show the average results for the 2014 hourly ambient temperatures.

From the table we can see that imposing a PUE of 1.1 does not require significant power budgeting as the ambient temperatures sustains the 40KWatt budget for most of the time. Average power budget imposed by our approach is of 39KWatt. If we impose instead a PUE of 1.075 we can see that now the average power budget over the year decreases to 33KWatt, which leads only to the 0.7% of weighted queue time loss (i.e. user QoS loss) w.r.t. the 1.1 PUE case; at the same time, the idle power does not increase significantly, leading to an effective PUE equal to the target one. Finally, if we impose a constant PUE equal to 1.05 for the entire year, we can notice that our approach will lead to an average power budget of 29KWatt over the entire year, which in turn produces a weighted queue time loss of almost 9% and an effective PUE of 1.057. The actual PUE becomes slightly higher than the target one due to the increase of