Scheduling Algorithm - Scratchpad Memory Management For Multicore Real-Time Embedded Systems

In this section, we first start with a description of our scheduling algorithm followed by a working example; we then move in Section 5.2.1 to discuss the design of our scheduler.

We note that in our execution model, each task must execute its load phase on the DMA before its computation phase on a core. After the computation phase, modified data has to be unloaded to main memory. However, having to schedule two DMA operations

time J1 DMA c2 c1 J2 J3 J4 J5 4 8 12 16 20 24 $1,1 $2,1 $1,2 $2,2 Partitions Colour Code Load Unload Hole J6 Jx J1 Jx J2 Jx J3 Jx J4 J1 J5 J2 J6

Figure 5.1: An example of scheduling 6 jobs on 2 cores.

(load and unload) complicates the schedulability analysis for dynamically scheduled tasks. Thus, we propose to combine the unload of one job to the load of the next job executed out of the same partition. Suppose that Ja → Jb → Jc. We note at this point that the consecutive execution of jobs on the same core alternate between its local memory partitions. Hence, Jc is the next job to execute after Ja out of the same partition. Then, when Jcis scheduled to be executed on the DMA, the DMA executes the unload phase of Ja non-preemptively with the load phase of Jc. For simplicity, we refer to the combined unload and load phases as the memory phase for the loaded job Jc. In addition, both memory phases and computation phases are executed non-preemptively. We note that the memory phase and the computation phase of one task are not necessarily executed continuously because after loading a task, the core might be busy executing non-preemptively another task out of the other partition. In other words, after loading a job into a local memory partition, its content is locked until the finish of its computation phase.

Example 1. Figure 5.1 depicts a working example for scheduling 6 jobs, generated by 6 different tasks, on 2 cores assuming all jobs are released at the same time. Since this is a fixed-priority schedule, the highest priority job J1 is chosen first. The scheduler chooses c1 to execute J1. The DMA is instructed to unload the previous task from $1,1 back to main memory. Then, the DMA is instructed to load J1 into $1,1. After that, J1 is able to run on c1 with no memory stalls. While c1 is executing J1 out of $1,1, the scheduler at time 3 chooses J2 to be executed on c2. Here, the DMA is running in parallel with J1 by unloading $1,2 and loading it with J2. At time 6, the scheduler chooses the free partition of c1 to execute J3. Similarly, J4 is chosen at time 10 to execute on the free partition of c2. At time 13, all four partitions are loaded. Hence, the memory phase of J5 has to wait until time 15, the finish time of the computation phase of J1 which indicates that c1 has again a free partition. Thus, the scheduler at time 15 chooses J5 to be executed on c1.

time J2 J1 DMA c2 c1 J1 J2 s1 s2 t J3 J3 f1 f2 J4 J4 Memory Computation

Figure 5.2: The cores are chosen based on the minimum sk.

Finally, J6 is scheduled at time 18 to execute on c2. We note that even though c2 has finished execution at time 20, J6 has to wait until time 22 because its memory phase is delayed. This delay induced a schedule hole between J4 and J6. We define a schedule hole as the time at which the core is idle waiting for a task to be loaded.

As you can see, the memory phases in the example schedule are largely overlapped with computation phases with a few induced schedule holes. This hiding of memory phases, gives our system a better schedulability over other state-of-the-art global scheduling techniques as shown in Section7.3.

In what follows, we discuss how our scheduler chooses cores to schedule tasks. In Figure5.2, we show a schedule of 4 jobs and 2 cores. The time pointers s1 and s2 indicate the start time of last scheduled jobs on c1 and c2, respectively. Similarly, the time pointers f1 and f2 indicate the finish time of last scheduled job on each core. Consider the time t at which each core has a free partition. Our scheduler chooses c1 to schedule J3 rather than c2 because c1 has earlier start time of last scheduled job i.e., s1 < s2. We design our scheduler to choose cores based on start time of their last scheduled jobs rather than the finish time to avoid the pessimism in the analysis. In particular, it gives us the guarantee to bound the amount of holes between computation phases as we discuss in Section5.3.2.

5.2.1 Scheduler Design

Our scheduler maintains a global queue Qr in which ready tasks are ordered according to fixed priorities. Whenever a task is released, it is inserted in this global queue. The dispatcher extracts from the top of the queue the highest priority task and execute it on the DMA, given that the DMA is idle and there is at least one available partition; otherwise,

the job remains inside the global queue. Furthermore, the scheduler is usually implemented as an interrupt service routine (ISR) triggered by certain events. In our system, we have the following three events: (1) task release, (2) memory phase completion and (3) computation phase completion. DMA-Dispatcher procedure below is triggered at time t, corresponding to one of these three events, to schedule a new task on the DMA. In addition, after events (2) and (3), if a new task has already been loaded, the core will do a context switch and execute this task.

For example, consider Figure5.1 again. At time 3, the completion time of the memory phase of J1, c1 will do a context switch to execute J1 and at the same time J2 will be scheduled to execute on the DMA. At time 15, the completion time of the computation phase of J1, c1 will do a context switch to execute J3 since it has already been loaded in $2,1 and at the same time J5 will be scheduled to execute on the DMA because the completion of J1 computation phase indicates a free partition.

1: _{procedure DMA-Dispatcher:} 2: i = Select-Task(Qr)

3: i = Select-Task(Qr)

4: (l, j) = Select-Core({sk})

5: tdma= t + unloadj+ loadi

6: sl = max(tdma, fl)

7: fl = sl+ xi

For simplicity, we assume at time t when DMA-Dispatcher procedure is invoked that (1) the DMA is idle, (2) at least one partition is available and (3) there is at least one ready task. Otherwise, the procedure will exit, as we assume a non-preemptive execution, and will be triggered again by a later event. Basically, we have: {sk} and {fk}, two sets of m time pointers to indicate the start and finish time, respectively, for the last scheduled job on each core, and tdma to indicate the end time of a DMA operation. Select-Task procedure in Line 3returns the index (i) of the highest priority task out of Qr. Similarly, Select-Core procedure in Line 4 returns the index (l) of the core that has the minimum sj, with ties broken arbitrarily, among all cores with free partition, and the index (j) of the last scheduled job on the selected partition. We need this j to add the unload time of previous job as in Line 5. Note that sl is updated in Line 6 to be the maximum between tdma and fl because Ji can start its computation phase if its memory phase is completed and core cl is idle.

In document Scratchpad Memory Management For Multicore Real-Time Embedded Systems (Page 130-134)