Outline of the Thesis - Modeling the power consumption of computing systems and applications th

This work contains a collection of techniques to asses the modeling of power consumption of servers. The goal of this thesis is to propose a methodology to create system- and application- level power estimators. In order to fulfill this objective, an in-depth analysis of the architecture was conducted and new algorithms were developed. They constitute a substantial part of the work and are described in details in the sections to follow. The remainder of this document is structured as follows:

Chapter 2: Background. This chapter explains some background concepts on computing system’s performance measurements and machine learning. Mostly, the concepts provided in this chapter are focused on the direction of this thesis, i.e. the machine learning techniques discussed here are mainly applied in function approximation (regression) problems.

Introduction

Chapter 3: State of the Art. This chapter reviews existing methodologies, models and tools to measure and estimate power consumption of computing systems. Power measurement and modeling techniques are described and in-depth models’ comparisons are made in order to contextualize this thesis.

Chapter 4: Power and Performance Measuring. The quality of performance and power measurements relies not only on the available physical infrastructure’s accuracy but also on how they are conducted. This chapter describes the hardware infrastructure and measurement methodology used to collect experimental data.

Chapter 5: Power Modeling. This chapter describes the methodology used to model the power consumption. First, a study of the energy consumption of each device is done in order to propose a generic workload. Furthermore, the machine learning techniques used to approximate the power function are detailed, as well as the variable reduction techniques.

Chapter 6: Evaluation of Power Models. This chapter presents the results of the power modeling methodology proposed in Chapter 5. The validation of the methodology is done for system- and application-level power estimations. It also compares the proposed estimator with other models and evaluates some use cases.

Chapter 7: Summary and Outlook. This chapter exposes the main conclusions of the thesis, proposing new issues to continue the research on energy efficient devices. It also includes the contributions of the research and a list of publications produced during the course of the thesis.

Appendix A: Summary and Outlook. This appendix is an extended abstract of the manuscript and is a requirement of the Graduate School for the doctoral degree from the Univer- sity of Toulouse. Each section of this appendix summarizes one of the chapters of the manuscript.

2 |

Background

“The noblest pleasure is the joy of understanding.” — Leonardo da Vinci

The use of machine learning techniques in the field of energy efficiency has been increasing over the last years. This chapter explains some background concepts on computing system’s performance measurements and machine learning. Mostly, the concepts provided in this chapter are focused on the direction on this thesis. First, techniques to collect performance indicators, which provide information regarding devices’ usage, are explained. Thus, machine learning techniques are described focusing in function approximation (regression) problems, which explores the performance indicators as input variables.

2.1 Performance indicators

A performance indicator is a measurement of the activity of a system. Key performance indicators (KPIs) may differ according to business drivers and aims. In computer sciences, KPIs are often used for comparing similar hardware and performance analysis, especially in HPC field. Performance indicators can be categorized into instrumentation or event. This section describes the most used techniques to collect KPI’s measurements.

Source Instrumentation

Source code instrumentation consists in adding specific code to the source files under analysis in order to provide detailed information during execution. After compilation, the execution of the program will produce a dump data used for run-time analysis or component testing. Usually, the execution times of code segments are evaluated during application development and testing in order to identify bottlenecks. Source instrumentation can either be done manually by the programmer or automatically at compilation time by the use of a profiler tool such as Gprof [61]. The main disadvantage of such technique is the dependence on the applications’ source code, which may not be available.

Binary Instrumentation

Binary instrumentation allows for analysis functions to be inserted into an application’s binary code, providing instrumentation of applications whose source code is not available [62]. However, binary instrumentation suffers from similar limitations as source code instrumentation, with the primary concern being the run-time overhead. The inclusion of the analysis functions requires a JIT (Just in Time) recompilation of the executable, significantly increasing the run-time

Background

overhead. For instance, the Pin tool [63] has a persistent overhead due to recompilation which has been measured to be around 30% before the execution of any analysis routes.

Operating System Events

The operating system keeps information of several devices utilization in order to execute resource allocation policies. Inside the kernel, several instructions to monitor processor, network, disk and memory are available, such as cores’ load and frequency, memory’s resident size, network throughput, and disk reads and writes. Some of these KPIs can be measured on Linux at process level from the /proc/[pid]/ file system, others can only be fetched system-wide. OS information is used for monitoring tools such as Linux/MacOS’s top, Linux’s gnome-system-monitor, Windows’s task manager and MacOS’s activity monitor to provide per process and system- wide performance measurements.

Performance Counters Events

In recent processor’s architecture, several performance monitoring units (PMU) are available to monitor distinct processor’s activities. PMUs can be generic or architecture specific. Perfor- mance monitoring counters (PMC) counts how many times an event occurred in a PMU. PMC is used interchangeably with other terms such as hardware performance counters and hardware instructions counters. These counters are available for each processor core and may be fetched at process- and system-level, allowing low-level measurement of process behavior. PAPI [64] was the first implementation of a generic performance counters library providing a generic interface to fetch the data from distinct PMCs. Recently, the perf_events tool was added to Linux’s kernel mainline [65]. As most of the PMCs are embedded on the hardware, other OSs as Win-

dows and MacOS provide access to them trough the Performance Data Helper1 and Instruments

tools2, respectively. Weaver [66, 67] realized that only a small set of counters are deterministic,

even though they have a small standard deviation. The non-determinism of PMCs may limit their utilization.

Model Specific Registers

Model Specific Registers (MSRs) are control registers that were initially used for debugging processor’s features during its development and testing phase. These registers were either removed or not documented into processor’s data-sheet, so their utilization was not frequent. Lately, vendors make their access available to programmers by describing their contents and how to fetch the data [68]. Some MSRs are available in PMC’s libraries, but they can also provide additional information such as core’s temperature and requested frequency. The acquisition of MSRs’ data varies according to processors’ vendor and architecture, being difficult to maintain in a cross platform environment.

In document Modeling the power consumption of computing systems and applications through machine learning techniques (Page 31-34)