Implementation Concerns - Estimating MTT WSS

5.3 Estimating MTT WSS

6.1.4 Implementation Concerns

In Chapter 4, we noted that (i)avoiding the scheduling of partially-eligible MTTs, (ii) employing lost-cause policies, and (iii) using promotion-duration policy (1) are all problematic 3_{This information is readily available on the Internet—one good resource for such information is the “list of}

Parameters L2 Miss Rate IPC

System Util. Video Quality GEDF _Heur. _{% Impr.} GEDF _Heur. _{% Impr.}

50% [1, 8] 25.97 17.12 34.06 1.36 1.30 -4.37 50% [1, 6] 27.55 17.12 37.86 1.30 1.42 9.26 50% [7, 8] 74.52 60.09 19.36 0.24 0.26 10.87 50% [1, 4] 28.34 16.45 41.94 1.29 1.38 6.62 100% [1, 8] 25.21 18.22 27.75 1.29 1.29 0.05 100% [1, 6] 26.15 18.00 31.16 1.27 1.38 8.42 100% [7, 8] 55.36 17.70 68.04 0.22 0.32 44.44 100% [1, 4] 30.85 16.94 45.10 1.23 1.35 9.85

Table 6.6: Results for video-encoding MTTs (best results in bold). As in Table 6.2, both the actual performance numbers for the tested heuristic and relative percentage improvements over GEDF_{are presented for each combination of parameters.}

when implementation efficiency is a concern; that is, when we want to keep scheduling overheads low. Thus, we do not use these policies in the heuristic that is implemented within LITMUSRT_{, even though these policies were employed in}_all _{of the heuristics that were evalu-} ated in Section 6.1.2. The experiments presented in this section were conducted to determine the impact of that decision on cache miss rates and IPC. In these experiments, we took the heuristic that performed best in Section 6.1.2 for each combination of task-set generation parameters (sixteen heuristics in total) and modified it so that three new heuristics were created (48 heuristics in total). We henceforth refer to the set of heuristics that performed best for each combination of task-set generation parameters, before any modifications, as the

original heuristics. Table 6.7 presents the cache miss rate and IPC for the new heuristics as compared to both each original heuristic (presented in the column labeled “H” in Table 6.7) andGEDF_{. For each combination of task-set generation parameters, the three new heuristics}

were created from the corresponding original heuristic as follows.

1. Each original heuristic was modified so that it does not avoid scheduling partially- eligible MTTs (note the double-negative), instead treating them as it would any other MTT. The column that presents the results for this heuristic is labeled “P” in Table 6.7. 2. Heuristic (1) was modified so that no lost-cause policy is employed. (This is also equiv- alent to employing a lost-cause policy, but with an extremely high or infinite lost-cause threshold.) The column that presents the results for this heuristic is labeled “PL” in Table 6.7.

Task Set Parameters L2 Miss Rate %S MTT U WD GEDF _H _P _PL _PLD _B 50 [0.01, 0.1] TC 3.62 1.60 1.31 3.28 3.58 3.58 50 [0.01, 0.1] Uni 7.14 3.16 2.53 17.65 11.12 11.12 50 [0.1, 0.4] TC 1.22 0.36 0.36 1.14 1.31 1.31 50 [0.1, 0.4] Uni 6.70 0.67 0.60 5.79 7.30 7.30 50 [0.5, 0.9] TC 1.07 0.28 0.28 1.08 1.08 1.08 50 [0.5, 0.9] Uni 15.38 0.98 0.85 15.13 15.48 15.48 50 [0.01, 0.9] TC 3.61 0.63 0.61 2.60 3.27 3.27 50 [0.01, 0.9] Uni 7.92 0.78 0.76 7.91 9.28 9.28 100 [0.01, 0.1] TC 5.30 1.67 1.44 4.87 4.71 4.71 100 [0.01, 0.1] Uni 7.22 2.57 1.97 24.86 14.17 14.17 100 [0.1, 0.4] TC 3.75 1.35 1.14 9.66 4.99 4.99 100 [0.1, 0.4] Uni 7.02 3.46 3.25 22.75 15.38 15.38 100 [0.5, 0.9] TC 3.81 2.83 0.76 1.99 2.27 2.27 100 [0.5, 0.9] Uni 5.03 3.58 3.57 4.51 5.05 5.08 100 [0.01, 0.9] TC 2.49 0.88 2.70 6.16 3.43 3.43 100 [0.01, 0.9] Uni 4.30 3.70 3.55 6.07 5.20 5.13

Task Set Parameters IPC

%S MTT U WD GEDF _H _P _PL _PLD _B 50 [0.01, 0.1] TC 0.97 1.23 1.27 1.00 1.00 1.00 50 [0.01, 0.1] Uni 0.80 1.17 1.26 0.62 0.73 0.73 50 [0.1, 0.4] TC 1.21 1.20 1.41 1.29 1.28 1.28 50 [0.1, 0.4] Uni 0.93 1.17 1.34 0.98 0.90 0.90 50 [0.5, 0.9] TC 1.03 1.01 1.58 1.46 1.46 1.46 50 [0.5, 0.9] Uni 0.77 0.92 1.61 1.18 1.16 1.16 50 [0.01, 0.9] TC 1.01 1.12 1.46 1.20 1.17 1.17 50 [0.01, 0.9] Uni 0.97 0.95 1.60 1.09 1.08 1.08 100 [0.01, 0.1] TC 0.85 1.16 1.22 0.91 0.93 0.93 100 [0.01, 0.1] Uni 0.76 1.11 1.15 0.46 0.61 0.61 100 [0.1, 0.4] TC 0.96 1.18 1.20 0.70 0.89 0.89 100 [0.1, 0.4] Uni 0.89 1.14 1.17 0.51 0.68 0.68 100 [0.5, 0.9] TC 1.05 1.13 1.25 1.12 1.11 1.11 100 [0.5, 0.9] Uni 0.99 1.06 1.07 0.99 0.98 0.98 100 [0.01, 0.9] TC 1.09 1.23 1.14 0.95 1.09 1.09 100 [0.01, 0.9] Uni 0.99 1.05 1.06 1.00 1.03 1.03

Table 6.7: The cache impact of avoiding policies that are difficult to implement efficiently in LITMUSRT. Task-set generation parameters are specified identically to Table 6.2. The column labeled “H” presents performance numbers for the heuristic that performed best (in Section 6.1.2) for the combination of task-set generation parameters indicated. The columns that follow then present results for the same heuristic, modified based on the following code: “P” indicates that the heuristic does not avoid scheduling partially-eligible MTTs; “L” indicates that the heuristic does not employ any lost-cause policy; and “D” indicates that promotion duration policy (2) was used instead of policy (1). Finally, the column labeled “B” presents performance numbers for the “best-performing” heuristic that was implemented within LITMUSRT_.

3. Heuristic (2) was modified so that promotion-duration policy (2) is used instead of policy (1). The column that presents the results for this heuristic is labeled “PLD” in Table 6.7.

Additionally, Table 6.7 presents results for the “best-performing” heuristic that is described in Section 4.5 of Chapter 4 and implemented within LITMUSRT_{. This heuristic does not} use the policies specified by (i), (ii), or (iii) stated at the beginning of this section. These results are presented in the column labeled “B” in Table 6.7.

Table 6.7 shows that, for each combination of task-set generation parameters, significant performance differences exist between each original heuristic and the new heuristics. First, for each combination of task-set generation parameters, heuristic (1) (column “P”) always

results in lower cache miss rates and higher IPC than the corresponding original heuristic (column “H”). This result was quite unexpected, but upon further reflection, we believe that it is because avoiding the scheduling of partially-eligible MTTs does not really achieve its intended goal of reducing pressure on the system in the future. When we schedule a partially- eligible MTT in the current quantum, that MTT must still be scheduled in a future quantum; however, the same is true when we donot schedule the MTT in the current quantum, except that more cores will be needed to fully schedule the MTT. In this case, the additional demand may often result in less scheduling flexibility, especially for MTTs with many tasks. This is because such MTTs may have their scheduling delayed multiple times, which can result in tardy jobs that must immediately be scheduled. Additionally, if an MTT is partially scheduled in the current quantum, then in most cases, all remaining cores were used to schedule it, and it will be the last MTT that is scheduled in the current quantum. As such, when scheduling that MTT does not result in thrashing, the jobs of that MTT that are scheduled in the future will likely benefit from cache reuse, since in the absence of tardy jobs, the remaining jobs will be scheduled in the next quantum.

Second, for each combination of task-set generation parameters, heuristic (2) (column “PL”) always results in higher cache miss rates and lower IPC than heuristic (1) (column “P”). We believe that employing a lost-cause policy impacted scheduling in the following ways. First, as discussed in Section 6.1.2, at 100% system utilization, phantom tasks cannot

be employed, so lost-cause policies (2) and (3), which schedule high-cache-impact MTTs in the current quantum (since thrashing will occur anyway) sometimes result in better performance in future quanta. Second, we believe that lost-cause policy (1) did more than just reduce average tardiness—when thrashing could not be avoided, the use ofGEDF_{allows the jobs that}

would otherwise become tardy soonest to be scheduled. Since tardy jobs have higher priority than all other jobs under all of our heuristics (in order to ensure timing constraints), such jobs can make cache-aware scheduling considerably more difficult, as they will be scheduled with no regard to their cache impact. Thus, it makes sense to schedule jobs before they become tardy when possible, to ensure that sufficient scheduling flexibility will exist to prevent cache thrashing in future quanta.

Third, switching from promotion-duration policy (1) to policy (2) (in comparing column “PL” to column “PLD”) seems to have either a minor negative impact on miss rates and IPC, or a significant positive impact. We also believe that policy (2) is more natural for an

EDF_{-based policy, especially when considering more than the first 20 quanta of execution, as}

it promotes a job for its entire execution rather than for a single quantum-sized unit of its computation (the latter may be more natural for Pfair-based policies that schedule quantum- sized subtasks).

Fourth, performance seems to improve across all heuristics (and GEDF_{) when the task}

count is correlated with MTT WSS—when WSS is not correlated with the task count, there is a greater potential that MTTs exist with low task counts and high WSSs. Such MTTs are often very difficult to schedule in a way that avoids thrashing. Regardless of the reason, a correlation between task count and MTT WSS is desirable, as we believe such a correlation to be the more realistic scenario, since a larger number of tasks should be capable of referencing a larger region of memory.

Finally, note that, when comparing heuristic (3) for each combination of task-set generation parameters (column “PLD”) to the best-performing heuristic that is implemented in LITMUSRT (column “B”), the performance differences were almost always negligible. Further, the best-performing heuristic typically outperforms GEDF_{except in several notable}

cases where lost-cause policies had the most significant impact, as such policies presented a way to avoid thrashing in scenarios where phantom tasks could not be employed. Since the best-performing heuristic does not employany lost-cause policy to keep scheduling overheads low, and phantom tasks cannot be employed, the best-performing heuristic has considerable difficulty achieving performance gains in these cases.

Overall, it appears that not employing a lost-cause policy has the greatest negative impact on system performance. Therefore, we conclude that finding a way to support lost-cause policies in the LITMUSRT _{implementation in a low-overhead manner should be our most} pressing concern related to improving our cache-aware scheduler, as supporting such policies would likely result in significant reductions in cache miss rates, and increases in IPC. How- ever, note that even without support for lost-cause policies, our cache-aware scheduler often performs very well when compared to GEDF_{, as we will see next.}

In document On the design and implementation of a cache-aware soft real-time scheduler for multicore platforms (Page 143-148)