3.6 Chapter Summary
4.1.2 Example of Collected Data
Figure 4.4 shows the percentage of resources requested by the jobs grouped by submission queue. The four queues active on Eurora (debug, longpar, parallel and reservation) are displayed in the x-axis and in the y-axis there are the five resources considered (in red the nodes, green for the cores, blue for the GPUs, yellow for the MICs and the memory in cyan). In the z-axis each bar represents the percentage of resources that can be attributed to the jobs which belong to a particular queue. It is easy to see that the jobs in parallel use the larger portion of resources in almost every case, with the exception of MICs which are primarily used by jobs in longpar ; the explanation is fairly simple, i.e. jobs in parallel and in longpar are the most computationally intensive ones and require more resources, whereas debug contains only light and relatively short lived jobs and reservation comprises only special jobs, much fewer than the other queues.
Queues: debug - longpar - par - reservation0.0 0.5 1.0 1.5
2.0 2.5
3.0 3.5 4.0
Nodes - Cores - GPUs - MICs - Memory
0 5 10 15 20 Resource Share 0 20 40 60 80 100
Figure 4.4: Percentage of requested resources, grouped by queue
Figure 4.5a plots for each jobs the average core temperature and CPU and GPU power; in the y-axis there is the average core temperature in ◦C and in the x-axis there is the average consumed power in Watt. The figure gives this information for both CPUs and GPUs, whose values are respectively identified by circles and triangles. The red circles are the jobs which run on the 3.1GHz nodes while the blue ones are the job in the 2.1GHz nodes; the green triangles represent jobs which required at least a GPU - and necessarily executed on a 3.1GHz node4. There is a clear linear relation between the temperature and
power for the CPUs, with the higher frequency nodes almost always consuming and heating more than their lower frequency counterparts (except for a handful of very power consuming jobs on 2.1GHz nodes - top right corner). The same is true for the GPUs where the linear relation is less steep, suggesting a lower package thermal resistance.
Figure 4.5b portrays the relation between consumed power and number of nodes used for each job. In y-axis we see the power in Watt, computed as the integral of the CPUs and GPUs power consumption for each node contributing to a given job. The x-axis reports the number of nodes used by the job. Again red circles for jobs using 3.1GHz nodes - but with no GPU - and blue ones for the 2.1GHz nodes; the read triangles stand for the jobs that used also a GPU. The majority of the jobs used only few nodes (between 1 and 5) and for the same number of used nodes the jobs with the higher power consumption are those running on, in this order, 3.1GHz with GPU, 3.1GHz without GPU and 2.1GHz nodes. It must also be noted that there is a clear trend showing a reduction of the node average power as the number of node used rises. This can be explained by the increase in the communication-to-computation ratio in larger applications.
Figure 4.6a plots the CPU power (Watt) in the y-axis and the cores average
4.1 Eurora Data 89
(a) Temperature (C) VS Power(W) (b) Power VS # Nodes
Figure 4.5: (a) Temperature and power for CPU and GPU; (b) Power Consumed and number of nodes used by job.
(a) Power (W) VS Load (b) Power (W) VS IPS (GOPs)
Figure 4.6: (a) Power and Load; (b) Power and Instructions Per Seconds.
load for the duration of the job is in the x-axis; the load is computed as the mean of all the loads of the cores used by a jobs (usually more than one) and it ranges between 0 (idle core) and 100 (core at maximum capacity). There is clearly a strong positive correlation between the core load and the power consumption, markedly more for jobs executed on higher frequency nodes - we maintained the same coloration seen in the previous graphs (blue and red circles) but here we do not differentiate between jobs that use a GPU or not.
Figure 4.6b shows the relation between the power consumed (Watt) by a job and its mean Istruction Per Second (IPS). In the y-axis we show the CPU power and on the x-axis the IPS measure; we again used the red and blue circles (no distinction for the GPU usage). In this case the plot is quite scattered but we can nevertheless observe the relationship between these measures, with the
power increasing as the IPS increases.
The proposed database can be used also to understand the relation between a user and the properties of its submitted job. This aspect can have a crucial significance for system owners and users, as it allows to devise different treat- ments and accounting mechanisms for different types of users. In Figure 4.7 we plotted the power consumed by a job (in Watt) on the y-axis and the du- ration of the job on the x-axis and we assigned a different color for each user - i.e. all the blue circles are jobs submitted by the same user. In the plot we reported only five users chosen randomly to keep the plot readable. Jobs with higher energy consumption fall in the top right corner of the plot. From this figure we can recognize that some users (such as the one identified by light-red circles) submit jobs without any discernible pattern - at least for the metrics considered in this plot - whereas others (like the yellow circles) always submit- ted quite similar jobs. This information can be used by user-aware dispatching to achieve machine power balancing or to avoid abrupt power changes in the delivery network.
Figure 4.7: Power and duration of a job, grouped by user