• No results found

3.3 Schedulers for multi-accelerators systems

3.3.4 F Redaelli et al (2010)

F. Redaelli et al [80] propose a multi-FPGA scheduler by investigating the partially reconfigurable and the dynamically reconfigurable features of these systems.

In addition of previous work on multi-task scheduler targeting single FPGA systems, this work provides an efficient design framework to deter- mine the best hardware configuration for workloads on multi-FPGA systems, including the number of FPGAs for each workload. The goal of the underly- ing method of this framework is to minimise the total execution time of each task by parallelising the tasks on multiple co-executed FPGAs, but still tak- ing into account the trade off with the additional communication overhead. The best configuration in terms of number of FPGAs will be reached when multi-FPGA communication delay begin to dominate the execution time at some point.

Their scheduler follows a heuristic-based approach to make decisions and select tasks in an just-in-time manner. When there are multiple possible

resources to execute a ready task, the heuristic instead uses the f arthest placement principle and uses an anti-fragmentation technique. It aims at pro- viding a better solution space for future placements, as it is has been demon- strated that it is easier to place large tasks in the center of the FPGA [5]. The scheduler also considers the latency from data migration between multi- FPGA to determine whether to reuse resources. Other runtime techniques, including configuration prefetching and limited reconfiguration, are also ap- plied to optimal the scheduling decision.

Chapter 4

COLAB: A

Heterogeneity-Aware Scheduler

for Asymmetric Chip

Multi-core Processors

4.1

Introduction

Most processor chips are incorporated into embedded devices, such as smart- phones and IoT sensors, which are by nature, energy limited. Therefore, energy efficiency is a crucial consideration for the design of new processor chips. Heterogeneous systems combine processors of different types to pro- vide energy-efficient processing for different types of workloads. In central processors, single-ISA asymmetric multicore processors (AMPs) are becom- ing increasingly popular, allowing extra flexibility in terms of runtime assign- ment of threads to cores, based on which core is the most appropriate for the workload, as well as on the current utilisation of cores. As a result of this, efficient scheduling for AMP processors has attracted a lot of attention in the literature [74]. The three main factors that influence the decisions of a general purpose AMP scheduler are:

• Core sensitivity. Cores of different types are designed for different workloads. For example, in ARM big.LITTLE systems, big cores are designed to serve latency-critical workloads or workloads with Instruc- tion Level Parallelism (ILP). Running other kinds of workloads on them

would not improve performance significantly while consuming more en- ergy. Therefore, it is critical to predict which threads would benefit the most from running on which kind of core.

• Thread criticality. Executing a thread faster does not necessarily translate into improved performance. An application might contain critical threads, the progress of which determines the progress of the whole application and it is these threads that the scheduler needs to pay special attention to. Therefore, it is essential to identify critical threads of an application and accelerate them as much as possible, regardless of core sensitivity.

• Fairness. In multiprogrammed environments, scheduling decisions should not only improve the utilisation of the system as whole, but should also ensure that no application is penalised disproportionately. Achieving fairness in the AMP setting is non-trivial, as allocating equal time slices in round robin manner to each application does not imply the same amount work done for each application. Therefore, it is critical to ensure that each application is able to make progress in a fair way. The research community has put considerable effort into tackling these problems. Prior research [48, 25, 60, 96, 36] has explored bottleneck and critical section acceleration, others have examined fairness [114, 104, 100, 68, 69], or core sensitivity [21, 66, 6]. More recent studies [64, 63, 85, 99, 61] have improved on previous work by optimizing for multiple factors. Such schedulers tuned for specific kinds of workloads – either single multi-threaded program or multiple single-threaded programs. Only one previous work, WASH [58], can handle general workloads composed of multiple programs, each one single- or multi-threaded, with potentially unbalanced threads, and with a total number of threads that may be higher than the number of cores. While a significant step forward, WASH only controls core affinity and does so through a fuzzy heuristic. Coarse-grain core affinity control means that it cannot handle core allocation and thread dispatching holistically to speed up the most critical threads. The latter means that WASH has only limited control over which threads run where, leaving much of the actual decision making to the underlying Linux CFS scheduler.

4.1. Introduction 67 { } Pl: Pb: α2 β2 Core Allocation: Thread Selection: Pl: Pb: α2 β2 Multi-factor mixed Model Multi-factor Coordinated Model Pl Pb α2 β2 { { } ? ? Pl Pb α2 β2 { } Runtime Scheduling γ

α (α12) a 2-thread program where thread α1 has

high speedup and blocks thread α2 β ( β1, β2) a 2-thread program where thread βblocks thread β 1

2

a single-thread program with high speedup

Speedup priority Block priority Speedup & Block priority γ γ γ γ α1 α1 α1 α1 β1 β1 β1 β1 All high priorities

threads only on big cores High priorities threads distributed on both cores No guideline from the mixed model

and VM

Detailed guidelines from the coordinated model

Figure 4.1: Motivating Example: Multi-threaded multiprogrammed work- load on asymmetric multicore processors with one big core Pb and one little

core Pl. The mixed model in the left hand side shows WASH decision and

the collaborated model in the right hand side shows the proposed COLAB decision. Controlling only core affinity results in suboptimal scheduling de- cisions.