The RTMC problem deals with the energy optimization problem on cluster-based multi-core processor platforms. The main objective is to find a task partition and a speed assignment. Therefore, the solution space ΩRT MC is defined as follows.
ΩRT MC = {(alloc, assign)1, (alloc, assign)2, ..., (alloc, assign)(∑
Oi|Si|)n} (4.13) The size of solution space is (∑Oi|Si|)n, where |Si| is the number of available P-states on Oi. n is the number of tasks. For instance, one considers a quad-core processor, where each processor quad-core supports 5 P-states. If 10 real-time tasks need to be scheduled, the number of possible solutions are (4 · 5)10≈ 1013. Furthermore, two solutions are said to be in a neighborhood, if they differ in the configuration of one task, i.e.,
∃τ ∈ Γ : assigni(τ) 6= assignj(τ) ∨ alloci(τ) 6= allocj(τ) (4.14) and
∀τ0∈ Γ \ {τ} : assigni(τ0) = assignj(τ0) ∧ alloci(τ0) = allocj(τ0) (4.15)
The neighbor selection process is similar to the case of RTSC, however, there is one more step for the RTMC problem. More specifically, a neighbor of the current solution is generated in three steps: i) select a task, ii) change its processor core allocation and iii) change its speed assignment. Figure 4.2 illustrates the generation process. Note that the other possibility is to first make reassignment and perform reallocation afterward. In this case, the
Input
Figure 4.2: Neighbor generation process for RTMC
reallocation step may be dependent on the result of the reassignment step, if a multi-core processor with heterogeneous cores in terms of power model is considered. For instance, if a processor contains two cores O1 and O2 grouped into two clusters and O1 and O2 support different range of speeds, the reassignment step may choose one speed that is supported by only one of the processor cores, e.g., O2. As a result, the reallocation step may only select O2 as the target. In this dissertation, the focus is on the first option that the core reallocation is made before the speed reassignment. The second possibility, however, will be further discussed in Chapter 9 as part of the future work.
For task selection (the first step) the guidance is defined based on the heuristic information extracted from the current solution. More precisely, each task τi is associated with a penalty value denoted by pen(τi). On the analogy to the RTSC problem, the penalty value of a task indicates the wasted energy by the task. The higher the penalty value, the more possibly the corresponding task is to be selected for a configuration change. pen(τi) is formally computed by means of (4.16), which is composed of three parts (on the right side of the equation).
pen(τi) = λ1· |F(assign(τi)) − F(SCj(τi))| +
λ2· tunbalanced(τi) + λ3 (4.16)
Like the penalty function definition for the RTSC problem, the first part ex-presses the wasted active energy during task execution. Note that, hereby it is assumed that τiis running on Oj. Due to the speed coordination constraint of cluster-based multi-core processor platforms, an additional term is added into the penalty function. Namely, the second part describes the unbalanced task execution in a cluster. More specifically, tunbalanced(τi) denotes the time inside a hyper period, where a processor core O executes τiat assign(τi) and at least one of the other cores O0 in the same cluster is enforced to operate at the same speed. In other words, by disregarding the processor core O, the active task on O0should have run at a lower speed than assign(τi). Obviously, the task with larger tunbalanced(τi) should be more penalized.
Figure 4.3 shows an example of task execution with two processor cores in
𝝉𝟏
Figure 4.3: An example task execution on a dual-core processor Task W (ms) T (ms) SC1 SC2 task specification is given in Table 4.3. The example further assumes that τ1
runs on O1 and τ2 on O2. S11 is assigned to τ1 and S22 to τ2, respectively.
Due to the speed coordination, task τ2 is running at a higher speed than its originally assigned speed during the time interval [0 ms, 10 ms]. In this case tunbalanced(τ1) is equal to 10 ms. The constants λ1 and λ2in (4.16) serve as coefficients to adjust the impact of the first and the second part, respectively.
The constant λ3is a technical parameter in order to prevent the penalty value from being zero. By assuming λ1= λ2= λ3= 1, the task penalty values in the example can be computed as follows.
pen(τ1) = |F(S11) − F(S12)| + 10 + 1 = 11.5 (4.17)
pen(τ2) = |F(S22) − F(S22)| + 0 + 1 = 1 (4.18) Based on the penalty values, the selection probability prob(τi) of each task can be derived.
prob(τi) = pen(τi)
∑
τj∈Γ
pen(τj) (4.19)
After a task τk is selected, in the second step the task is to be reallocated to a new core. Hereby it is quite reasonable to balance processor load and avoid generating invalid solutions, where system schedulability is not en-sured. For this purpose each processor core Oj is associated with a reward
value rew(Oj), which is simply the available utilization that is still free to be used. Formally the reward function rew(Oj) is defined in (4.20).
rew(Oj) =
(UBj−U(Oj), if alloc(τk) 6= Oj
UBj−U(Oj) +U (τk), otherwise (4.20) UBjis the utilization upper bound on the processor core Oj to ensure schedu-lability. For instance, UBjis 1 if the EDF scheduling algorithm is used. UBj is 0.69, if the RM algorithm is used. Moreover, U (Oj) is the current processor utilization on the core Ojand U (τk) is the utilization of the task τk, assuming that τk is selected in the first step. In the example shown in Figure 4.3, if the task τ1is selected and needs to be reallocated, the reward value for each processor core can be computed by
rew(O1) = 1 −W(τ1) · F(S11)
F(S11) · T (τ1) +W(τ1) · F(S11) F(S11) · T (τ1)
= 1 −10 20+10
20
= 1 (4.21)
and
rew(O2) = 1 −W(τ2) · F(S21) F(S22) · T (τ2)
= 1 −30 40
= 0.25 (4.22)
Analogous to task selection, the selection probability prob(Oj) of each core can be derived on the basis of the reward values.
prob(Oj) = rew(Oj)
∑
Oi∈O
rew(Oi) (4.23)
Intuitively, the processor core with more free utilization will be selected more likely.
Finally, in the third step a new speed is to be assigned to the selected task.
Hereby the critical speed is selected with the probability 0.5 and the remain-ing probability is uniformly distributed to the other available speeds. Algo-rithm 3 shows an overview of the proposed heuristic search algoAlgo-rithm. The
initial solution is simply generated by means of the worst fit strategy [DB11]
to partition the tasks and afterward assigning all the tasks with the maximal speed.
Algorithm 3 Heuristic Search Algorithm for RTMC (HSAMC) Input: The solution space ΩRT MC
Output: A solution (alloc, assign)best
1: Generate (alloc, assign)init
2: (alloc, assign)best ← (alloc, assign)init
3: (alloc, assign)curr← (alloc, assign)init
4: i← 1
5: while termination criterion is not met do
6: Select a τ according to the penalty values
7: Reallocate τ according to the reward values
8: Reassign τ with a new speed to obtain (alloc, assign)0
9: Generate a random number r ∈ [0, 1]
10: if r ≤ Praccept((alloc, assign)curr, (alloc, assign)0, i) then
11: (alloc, assign)curr ← (alloc, assign)0
12: if J((alloc, assign)curr) < J((alloc, assign)best) then
13: (alloc, assign)best ← (alloc, assign)curr
14: end if
15: end if
16: i← i + 1
17: end while
The HSAMC algorithm returns the best solution it has ever visited, which results in a better performance than the original version of SA. Similar to the HSASC algorithm, the cost or value of a solution is defined as the energy consumption over one hyper period. If the solution is not able to guarantee system schedulability, its cost is set to infinity. Hereby the cost of a solution is assumed to be obtained in some way. More details are explained in Chapter 6.
Table 4.4 shows a possible sequence of solutions that are visited by the HSAMC algorithm for the example shown in Figure 4.3.
4.4 Chapter Summary
This chapter first gave an review of the RTSC and RTMC problems in terms of complexity. Unfortunately, both problems are proven to be N P-hard in the strong sense. As a result, there is no efficient algorithm available. Sub-sequently, two heuristic search algorithms based on simulated annealing for
Task τ1 τ2 1st solution (O1, S1) (O2, S1) 2nd solution (O1, S1) (O2, S2) 3rd solution (O2, S2) (O2, S2) 4th solution (O1, S2) (O2, S2) 5th solution (O1, S2) (O1, S1) Table 4.4: A series of possible solutions
RTSC and RTMC are proposed, which are the main contribution of this chap-ter.
Chapter 5
Run-Time Behavior Analysis
The previous chapter introduced the heuristic algorithms for energy optimiza-tion problem in the context of hard real-time systems. The guided search ap-proach is iterative and based on the simulated annealing algorithm. Though it may work well in most cases, there is no guarantee that the heuristic algo-rithm will produce a globally optimal solution. There is even no information about the quality of the output solution. In this chapter, an efficient mech-anism is proposed to efficiently analyze and predict the performance of the algorithm. A termination criterion is derived as well. The main part of this chapter is published in [HM13a].