• No results found

In this chapter, we have presented a study of the problem of scheduling jobs with chain precedence constraints on unrelated, parallel machines with the objective of minimizing energy consumed by the schedule. We assume that a communication network connects all machines together and that communication between each pair of machines is asymmetric. The job set is given as a single chain with each job characterized by a strict deadline that must be met. The time and energy required to execute a job on a particular machine is

given in lookup tables. Likewise, the time and energy required to communicate over the network links are also given in lookup tables.

The problem presented in this chapter is meant to serve as a case study as well which we

will be discussing in Chapter 6. It serves as a type of dynamic programming algorithm

with the characteristic of a structure that allows us to develop a data-parallel algorithm for graphics processors.

A future direction to extend the problem will be to consider other possible types of prece- dence constraints such as a job set consisting of multiple chains or tree-like precedence constraints. This will closely model task graphs of parallel applications in practice.

Chapter 4

Energy-Ecient Flow Time

Scheduling

4.1 Introduction

In this chapter, we present an empirical study of the problem of job scheduling to min- imize ow time plus energy in both single-processor and multi-processor settings. We implement and investigate several heuristics used for various aspects of the scheduling process such as processor speed selection, job selection and job allocation heuristics for the case of multi-processors. The motivation behind our study comes from the design constraints associated with ubiquitous computing, especially portable systems such as notebooks and mobile phones or tablets, where battery life is directly related to the en- ergy eciency of the underlying system. Furthermore, it is expected that these systems do not compromise on performance and quality of service while operating within accept- able levels of energy consumption. Consequently, we incorporate ow time as a measure of quality of service in addition to the objective of minimizing energy consumption. The additional objective of maintaining desired levels of performance is orthogonal to the objective of saving energy.

Modern computer chips are capable of delivering huge amounts of processing power per square inch brought about by technological advances in chip design and fabrication processes. As a result, the power envelope for these parts is pushed up thereby raising the energy demands for computer systems. A direct implication is that, apart from the cost of running these systems, one must factor in the cost of cooling from small scale, embedded in the systems to large scale as the case of data centres and server stores. Due to these reasons, energy conservation has become a critical design feature in modern computer systems and several technologies and techniques have been developed

to better utilize and conserve energy as much as possible. For instance, features such

as AMD PowerNow!—and Intel SpeedStep®allow the operating system to dynamically

adjust core frequency and voltage thereby altering the speed of the processor in order

to conserve power [1,45]. This concept of adjusting the speed of the processor to meet

computing demands is often known as throttling or dynamic speed scaling.

The closest work to the study presented in this chapter is that presented in [7], where a

number of job selection policies and speed-scaling algorithms were analyzed empirically. The goal of their empirical studies is to analyze these speed-scaling algorithms coupled with the job selection policies in order to demonstrate that their real-world performance can be improved by using knowledge of the input job data. Furthermore, they were also able to demonstrate that dierent algorithms work better on certain types of input data and as a result, the input data should be taken into account when choosing a speed- scaling algorithm. One of the speed-scaling algorithm they studied was AJC (Active

Job Count) [4] and SRPT (Smallest Remaining Processing Time) was also one of the

job selection policies. Our empirical studies also includes these two algorithms but for diferent reasons.

In this chapter, we compare AJC with several xed speed heuristics, including a semi- clairvoyant xed speed function that we designed. We describe this heuristic as semi- clairvoyant since it requires an approximate knowledge of the characteristics of a given instance of jobs. In contrast, a clairvoyant algorithm would require exact knowledge, for example, the exact arrival time of each job, while a non-clairvoyant algorithm has no knowledge of the jobs. The purpose of this comparison with the semi-clairvoyant xed speed function is to attempt to demonstrate that it is possible to design a simple xed speed function that performs close to AJC in objective of minimizing total ow time plus energy. The investigation of a simpler alternative is due to the fact that AJC can be quite computationally intensive to implement in practice and given the arbitrary nature of the speed spectrum, it could be a challenge to support such a capability in hardware. Our studies also uses AJC in multi-processor scheduling that attempts to demonstrate that having more processors can be very cost-eective in minimizing total ow time plus energy.

The highlights of our results in this chapter, within the context of minimizing total ow time plus energy, include,

ˆ Empirically, we are able to conrm the theoretical result that SRPT is a better job selection policy in comparison to SJF.

ˆ As a speed-scaling algorithm, AJC is very eective and performs better than a xed-speed heuristic.

Chapter 4. Energy-Ecient Flow Time Scheduling 40 ˆ Given some prior knowledge about a job sequence, it is possible to design a much simpler xed-speed heuristic that can perform close to AJC. In other words, without some insight about the job sequence, a xed-speed heuristic cannot perform better than a speed-scaling algorithm.

ˆ We demonstrate that with multiple processors and speed-scaling, we can achieve signicantly better performance over a single processor. Furthermore, it is also interesting to note that we observed a trend where the performance benet from multiple processors can only be noticed beyond a certain number of processors, regardless of the nature of the job sequence.

The rest of this chapter is organized as follows; in Section 4.2, we present a formal

denition of the problem then we present the heuristics we designed, implemented and evaluated, in Section4.3. Finally, in Section4.4, we describe the setup for the simulations and present our observations and results.