Fairness-Oriented Code Positioning

3.3 Our Approaches

3.3.3 Fairness-Oriented Code Positioning

While WCO focuses on optimizing a single thread that has the worst WCET among co-running threads, FO code positioning aims at optimizing all the

co-running threads to ensure fairness. Since the WCETs of both threads may vary significantly, the “fairness” has different meanings and implications, depending on the optimizing objectives. In this work, the FO strategies are divided into two different schemes according to the “fairness” goals, including 1) reducing approximately the same amount of WCET, and 2) reducing approximately the same percentage of WCET. Accordingly, two schemes are named

Amount-Fairness-Oriented (AFO) code positioning and

Algorithm 1 W C Oriented Code P ositioning

1: begin

2: boolean terminate = f alse;

3: P 1 wcet = W CET Analysis(P 1); 4: P 2 wcet = W CET Analysis(P 2);

5: Conf lict Op List = Bulid Conf lict Op List(P 1, P 2); 6: repeat

7: if P 1 wcet > P 2 wcet then

8: P ositioning(P 2, Conf lict Op List); 9: P 1 wcet = W CET Analysis(P 1); 10: P 2 wcet = W CET Analysis(P 2); 11: if P 1 wcet > P 2 wcet then

12: terminate = true; 13: else

14: W C Oriented Code P ositioning(P 1, P 2); 15: end if

16: else

17: P ositioning(P 1, Conf lict Op List); 18: P 1 wcet = W CET Analysis(P 1); 19: P 2 wcet = W CET Analysis(P 2); 20: if P 1 wcet < P 2 wcet then

21: terminate = true; 22: else

23: W C Oriented Code P ositioning(P 1, P 2); 24: end if

25: end if

26: until terminate == true;

Amount-Fairness-Oriented Scheme

Amount-Fairness-Oriented (AFO) code positioning algorithm aims at

reducing the WCETs of both co-running threads by approximately equal amount. When the WCO code positioning approach is applied, only the instructions of the thread with shorter (i.e. “better”) WCET are positioned to reduce the WCET of the other thread as much as possible. In this case, the amount of WCET reduced by avoiding the inter-thread L2 cache conflicts is the same to both threads; however, the difference of the amount of WCET reduction can be caused by different intra-thread L1 and L2 cache misses due to the WCO code positioning. Therefore, AFO can leverage WCO to decrease the inter-thread cache misses, while it tries to recover some of the positioned instructions in WCO by a procedure named De− positioning to ensure that the intra-thread cache miss penalties of both threads are reduced by approximately the same amount.

The algorithm of AFO is demonstrated in Algorithm 2. The inputs and the initialization phase are the same as WCO. In line 6, the WCO algorithm is invoked to reduce the inter-thread L2 cache misses. In this algorithm, P2 is assumed to be the thread with a larger WCET; therefore, only the instructions from P1 are positioned by WCO. Furthermore, some positioned instructions of P1 are

recovered to their original positions by the procedure De− positioning at line 8, and the corresponding instructions from P2 are positioned instead to avoid the inter-thread L2 cache conflicts at line 9. After positioning,the resulting WCETs of both programs are computed at line 10 and 11. Then the difference of WCET

reduction of both programs (i.e. ∆W ) is calculated at line 12. If this difference is larger than the difference of last iteration (i.e. ∆Original W ), then the

termination variable is assigned as true; otherwise, the smaller difference is assigned to ∆Original W to further minimize the difference in terms of the amount of WCET reduction for both threads. This algorithm is repeated till the value of termination variable becomes true.

Algorithm 2 AF Oriented Code P ositioning

1: begin

2: boolean terminate = f alse;

3: Original P 1 wcet = W CET Analysis(P 1); 4: Original P 2 wcet = W CET Analysis(P 2);

5: Conf lict Op List = Bulid Conf lict Op List(P 1, P 2); 6: W C Oriented Code P ositioning(P 1, P 2);

7: repeat

8: De− positioning(P 1, Conflict Op List); 9: P ositioning(P 2, Conf lict Op List); 10: P 1 wcet = W CET Analysis(P 1); 11: P 2 wcet = W CET Analysis(P 2); 12: ∆W = Calculate Amount V ariation(); 13: if ∆W >= ∆Original W then

14: terminate = true; 15: else

16: ∆Original W = ∆W ; 17: end if

18: until terminate == true;

Percentage-Fairness-Oriented Scheme

While AFO targets approximately the same amount of WCET reduction, PFO aims at about the same percentage of WCET reduction. The principle of Percentage-Fairness-Oriented code positioning is described as the following. In the multi-core processor with a shared L2 cache, the WCET of a thread can be broken into the computation time by assuming perfect caches, the L1 cache miss penalty and the L2 cache miss penalty. The L2 cache miss penalty consists of two parts: the intra-thread L2 cache miss penalty and the inter-thread L2 cache miss penalty. The WCET of a thread can be calculated by Equation 1, where E stands for the computation time without considering cache misses, L1 is L1 cache miss penalty, and In L2 and Out L2 represent the intra-thread and inter-thread L2 cache miss penalty respectively.

W CET = E + L1+ (In L2+ Out L2) (3.1)

After code positioning, the inter-thread cache conflicts will be decreased; however, the intra-thread cache conflicts both on L1 and L2 caches may increase. Since the computation time E is the same before or after code positioning, the improvement of the WCET after code positioning can be illustrated as Equation 2.

∆W CET = ∆Out L2+ ∆L1+ ∆In L2 (3.2)

As the goal of PFO is to reduce the WCET of each real-time thread by approximately equal percentage, assuming that there are two threads, i.e., Thread

A and Thread B, Equation 3 can be used to characterize this scheme. In this

equation, W CETA and W CETB are the original WCETs of Thread A and Thread

B, respectively, and ∆W CETA and ∆W CETB are derived from Equation 2

denoting the change of the WCET for each thread.

∆W CETA

W CETA ≈

∆W CETB

W CETB

(3.3)

Because the execution time E may vary substantially for different real-time threads, it becomes very hard, if not impossible, to guarantee the same percentage of WCET reduction if E is considered. Also, since the execution time E is

insensitive to cache-based optimizations, the PFO scheme focuses on reducing the same percentage of L1 and L2 cache miss penalties for both threads through cooperative code positioning. We also find that while the reduction of inter-thread cache conflict is mutual, the L1 cache misses and L2 intra-thread misses of a thread are heavily dependent on how many instructions are positioned to that thread. Specifically, the more instructions are positioned for a thread, the more possible intra-thread L1 and L2 cache conflicts may occur in that thread.

Therefore, in order to reduce the WCETs of both threads by approximately equal percentage, the number of instructions to be positioned for each thread should be inversely proportional to its original WCET as depicted in Equation 4.

Instr N umB

W CETA

≈ Instr N umA

W CETB

(3.4)

positioning approach. The inputs of the algorithm are the two programs to be optimized. In line 2, the termination variable is initialized. In the next three lines, the original WCETs of both programs are calculated, and the L2 cache conflict instruction list is determined as well. First, the instructions needed to be

positioned for both programs are identified according to the designing principle of PFO at line 7 and line 8. Then both programs are positioned at line 9 and line 10. From line 11 to line 12, the WCETs of both programs are calculated after

positioning. Based on the original WCETs and new WCETs of both programs, the WCET percentage variance between these two programs is calculated to determine whether or not the WCET percentage variance after positioning is smaller than the original WCET percentage variance at line 13 and 14. If true, the original WCET percentage variance ∆Original P is assigned to be the most recently calculated WCET percentage variance ∆P at line 17; otherwise, the termination variable is assigned to be true at line 15. This algorithm is repeated till the value of termination variable becomes true.

In document WCET Optimizations and Architectural Support for Hard Real-Time Systems (Page 62-68)