• No results found

Computer software is composed of one or more processes, being divided into application or system software. System software manages and integrates computer’s capabilities through OS kernel and device drivers. On the other side, application software interacts with the end-user to perform specific tasks. However, application software cannot run by itself, depending on system software to execute. Some system software change environment behavior, such as processor frequency, impacting the power consumption of all applications.

Process-level power estimators allow monitoring operating system, stand-alone or distributed applications’ power consumption. Computer software is composed of one or more processes, i.e. at run-time the performance of any software can be measured by computing the sum of

its processes. In a Linux production environment, OS processes are usually owned by the

root user being easy to be identified and filtered. Similar approach can be used for multi- process applications, by selecting the processes that belongs to the same arborescence from the application’s main process. The arborescence of processes can be fetched, for instance using the pstree command. In addition, distributed applications are managed by a job scheduler from which the worker nodes and PIDs used by an application can be retrieved. Based on the nodes and PIDs of a distributed application, its total power consumption can be measured summing the power of the processes involved in the application. Process-level power models can also be used to estimate the power consumption of KVM based virtual machines used by several managers such as OpenNebula [156] and OpenStack [157].

5.4.1 Requirements and limitations

Since the power of each process cannot be directly measured, finer-grained models need to be created using system-level measurements. Therefore, power decoupling assumptions need to be

done. In this work we propose decoupling the machine power (Pmach) into static (Pst), shared

(Pshr) and process exclusive power (Pexcpid), as follows:

Pmach= Pst+ Pshr+

X

pid∈P IDs

Pexcpid. (5.15)

The static power is the minimal power used by the hardware when it is idle. Since the static power can be measured as the average idle power, it is simple to be decoupled. Shared power regards the power dissipated by hardware resources shared between applications, for instance

Power Modeling

to wake up a device using energy savings techniques. The exclusive power of a process is the power dissipated based on the resource utilization of a process.

The static and shared powers can be included into Ppid in several ways such as resource

usage (Equation 5.16) or equal shares (Equation 5.17), as follows:

Ppid= upidresources(Pstatic+ Pshared) + Pspecif icpid , (5.16)

Ppid =

1

|P IDs|(Pstatic+ Pshared) + P

pid

specif ic, (5.17)

where upidresourcesis the percentage usage of a given resource by a process (pid). In here there is no

right or wrong approach, since both guarantees the condition for a proper validation, i.e. that the sum of all processes shares is equals to 1. However, each of the above mentioned equations will provide different power estimation for the same process.

The use of machine learning techniques requires target values to learn from. Since only the machine power is available, the performance indicators used as inputs candidates are collected at system-level as well, i.e. the sum of all processes. This learning constraint imposes the use

of a distributive function as estimator, i.e. a process-level estimation function ˆp = f (x), where

x is a vector of process-level variables, as follows:

f (X

pid

xpid) =X

pid

f (xpid). (5.18)

Sigmoid functions used as activation function in the proposed system-level modeling method- ology do not have such property. Actually, only linear functions will present the distributive property. Thus, linear regression and ANNs with linear activation functions are distributive. The use of linear activation functions incurs in a linear function since the linear combination of linear functions is a linear function. Hence, there is no need to use ANNs at process-level, since it would have a similar or worst performance than a linear regression. For this reason, the remainder of this report considers that an ANN always has a sigmoid activation function.

5.4.2 Process-level models

Due to the constraints presented earlier, the use of ANN, with sigmoid activation functions, cannot be ported to the process-level. Thus, in this section we will compare the capleak and the learned linear regression models, which can easily decouple the system-wide from the processor- level power. For the linear regression this is done by selecting the system level sensors plus the constant for system-wide estimation, and a similar approach can be done at for the process level sensors. The capleak model decouples the power by defining the static power as the constant plus the estimated power at room temperature, the system power as the temperature variation and the process power as the processor load times its frequency, as follows:

Pcapleakstatic = w0 + w2 ∗ X cinC t/oC0, (5.19) Pcapleaksystem= w2 ∗ X cinC (t/oCc− t/oC0), (5.20)

Pcapleakprocess= w1 ∗ uprocessp X

cinC

fc, (5.21)

where t/oC

Power Modeling

5.4.3 Evaluation methodology

First, the models are evaluated by comparing if the sum of the power consumption of all process is close to the system’s power consumption. Then, we analyze how different learning cases will impact the models. Finally, a mixed workload is evaluated to see if it still fits to the workloads and to illustrate a usage case.

5.5

Summary

Computer hardware profiles allow the identification of the most power hungry devices. In our infrastructure, it was shown that Atom modules’ power consumption have little variation com- pared to i7 ones. Atom modules have most of their power dissipated by the base board, while in i7 modules the processor is the device which consumes the most. Based on a complete analysis of the power dissipated in by each device, a generic workload was proposed. The use of a generic workload capable of reproducing hardware utilization for any real workload is of great importance for Machine Learning. The evaluation of such workload shown that it does not covers all possible usage, even though when compared to any other real workload it is the most generic one. This led us to propose three learning data-sets to evaluate power model’s creation. Another important aspect shown in this chapter was the constraints for process-level power modeling. In here we proved that process-level power estimations can only be validated for distributive functions. Since neural networks with sigmoid activation functions do not have such property, it cannot be used at this granularity. Thus, the use of either predefined models or pure linear estimation is needed.

6

|

Evaluation of Power Models

“It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.”

— Richard P. Feynman

The evaluation of new models plays a crucial role in determining their accuracy and usability. Some authors validate their estimator based on the learning data-set, i.e. use the same data to learn and validate the model; others separate a subset of the learning data to test it. However, these approaches do not evaluate the model’s ability of generalization for new workloads. This chapter evaluates several power models based not only on the learning data-set, but also explor- ing completely new workloads and operational system’s setups. In addition to the generalization capacity of the model, we also evaluate the applicability of the models in real world use cases.

This chapter presents experimental results for the methodology proposed in Chapter 5. Machine learning models for system- and application-level power estimation are evaluated as

follows. First, Section 6.1 analyzes the impact of data pre-processing on the final models’

accuracy. Then, Section 6.2 provides an in-depth analysis of system-level power modeling. This analysis includes a comparison between calibrated predefined models and models learned from data without a priori architecture dependent knowledge. In addition, a distributed system’s use case is investigated. Finally, Section 6.3 evaluates process-level models and presents a concurrent workload execution use case.

6.1

Data pre-processing

The impact of data pre-processing on system-level models’ accuracy is evaluated by training an ANN for some combinations of the pre-processing methods as described in Section 5.3.1. The accuracy of the methods is defined using the MAE metric to measure their learning and vali- dation’s error. The results for each pre-processing case exploit the same data acquired during several executions of the generic workload. Figure 6.1 shows the results of this experiment, where the bars represent the average of the error metric, while the whiskers represent the stan- dard deviation for the generic workload. For the learning phase, a single run is considered, so no standard deviation is seen. However, four data-sets from four different executions were evaluated to validate the methodology. The number of validation executions was kept small due to the generic workload’s long execution time duration, which takes more than one hour to be executed.

In Figure 6.1, the raw bars represent the ANN’s performance when using the acquired data without applying any pre-processing technique, while the psu_model, timesync, steady and unique bars correspond to a single method applied to the raw data at a time. One can see

Evaluation of Power Models

raw psu_model timesync steady unique pt ps pu pts ptu ptus

MAE (W) 0.0 1.0 2.0 3.0 Learning Validation

Figure 6.1: Impact of each pre-processing method on the final model for the training and validation runs.

from these five grouped bars that any pre-processing method alone is better than using the

raw data for learning. Although the unique method has the worst validation with a high

variance of the error, it is still under the same range of error than the raw data. The best pre-processing techniques, when applied independently, are the timesync and steady, which present similar performance for both learning and validation data-sets. However, due to the non-linearity that the psu_model adds to the data (see Section 4.3.1), we decided to keep it in for further evaluations, coupling it with other methods.

The proposed pre-processing methods were then coupled as follows. The bars named pt, ps, pu combines the psu_model with timesync, steady and unique models, respectively. The results of these combinations show that they decrease the validation error of the timesync and unique models, keeping the error from the steady the same. This means that PSU modeling will either enhance or not influence the results of data pre-processing. To continue the experiment, it was decided to keep the method which enhanced the accuracy the most, i.e. the timesync. Then, the psu_model and timesync were combined with steady and unique methods (pts and ptu bars), and the validation error slightly increased, suggesting worst combinations. Finally, all methods were combined together (ptus bars). The combination of all methods presents similar validation accuracy with the pt combination, given that their validation variance superpose, while the learning error of ptus slightly decreases.

The comparison between the ptus combination and the raw data shows that the former enhance the accuracy of both learning and validation results. The learning error decreases from 0.90 to 0.39, i.e. an improvement of 55%; while the validation mean error goes from 1.97 to 1.27, i.e. a gain of 35%. These results are quite impressive considering that the only change here was due to the data pre-processing. The remainder of the experiments conducted in this work use the ptus combination to pre-process the data since it covers all the identified issues and provides extraordinary enhancement in models’ accuracy.

Related documents