The initial allocation of resources to host one action execution starts on the Action Scheduler, where the scheduling system determines the most suitable node-implementation pair to perform the action. To make this initial decision, the Action Scheduler lies on Scores: a comparable object gathering all the information that any scheduling policy may need to compare two different action-node-implementation options. This information can relate to the action per se, to the fact of hosting of the action on the node, or to the execution of one of specific implementation on the node.
Upon the reception of a new action, the Action Scheduler computes the Score for every possible node-implementation pair. Initially, it determines the priority of the action and estimates
9.2. INITIAL SCHEDULING
time when the action will become free of dependencies by checking the expected end time of all the static predecessors of the action. After that, the Action Scheduler computes for each node the expected delay for transferring those input values missing in the specific node and determines the expected start time for any implementation assuming that the node has enough free resources to host it. Therefore, the last step is to compute the earliest possible start time for each implementation on the node. To contextualize an implementation execution on a node, the Action Scheduler requires knowledge about the availability of the resources of such node, information known by the corresponding Node Scheduler. For this reason, the former asks the latter to complete the Score with an estimation of the end time, energy consumption and economic cost of running such implementation on the node based on historical data from previous executions.
Finally, the Action Scheduler only needs to compare all the obtained Scores to select the best node-implementation pair. By merely changing a one-to-one comparison function, the owner of the infrastructure can define different policies to select the initial node-implementation selection without any need of looking at the code of the scheduling system. For instance, comparing the expected end time of the options minimizes the execution time of the application; changing the behavior of the comparison function to consider only the energy footprint of each option modifies the system so that it minimizes the energy consumption of the execution. Upon taking the decision, the Action Scheduler submits the action to the Node Scheduler corresponding to the selected node indicating the selected implementation so that it adds the necessary resource dependencies.
Determining the earliest time when an implementation can start running on a node is not straightforward. It requires keeping track of all the already scheduled executions and finding out when there will be enough resources to host it and check that they will remain available throughout the whole execution. To ease the seeking, the Node Scheduler keeps a register of those moments when some resources are idle. For each of these moments, known as gaps, it records a description of the available resources, the time when they become available, the action that used them immediately before – the origin of the gap – and the earliest time when another action use them again. Initially, the Node Scheduler has one single gap registered with an unknown origin that contains all the resources of the node – for instance, two CPU cores – from timestamp 0 to the end of the execution.
When the Action Scheduler decides to submit an action to the node, the Node Scheduler checks if there is any combination of gaps that could host its execution. For instance, when the Node Scheduler from the previous example receives the first action, which uses one CPU core for 100 ms and has no dependencies with other actions, it decides to reserve the resources from the gap with two CPU cores. For doing so, the Node Scheduler splits the gap into two gaps: one containing the occupied resources from the gap start time until the scheduled start time of the action and a second one containing the remaining resources with the same start and end time of the original
gap. Both gaps maintain the origin of the original gap. In the example, the two-CPU-cores gap becomes two gaps: one with one CPU core starting a timestamp 0 until the expected start time of the action execution – timestamp 0 –; and one with one CPU core starting at timestamp 0 until the end of the execution. Since the first gap has the same start and end time, the gap lasts nothing, and the node Scheduler dismisses it.
Besides reserving the resources for the action execution, the Node Scheduler also needs to release them at the end of the action. For that purpose, it adds a new gap containing the resources released by the action from the end of the action execution until the end of the whole execution. In this case, the action releasing the resources becomes the origin of such gap. Therefore, after scheduling this first action on the example, the Node Scheduler would have two gaps: the one with the unused resources and the one containing the resources released by the action starting the end time of the action – timestamp 100 ms – until the end of the execution.
When the Action Scheduler assigns a second action to the node, the Node Scheduler repeats the process. For instance, it could receive an action exactly as the first one but with a static dependency expectedly released on timestamp 20 ms. In this case, the Node Scheduler would check the available gaps and find that it can fit the action on the unused resources. Therefore, it takes the gap and splits it into two gaps: a first one starting at 0 until the start of the action – timestamp 20 – containing the resources used by the action, and a second one with the remaining resources. However, since the action uses all the resources within the original gap, the Node Scheduler dismisses the latter. As with the first action, it registers a new gap with the resources released by the second action from timestamp 120 until the end of the execution. The origin of such gap is the second action. After scheduling the second action, the Node Scheduler has a list containing three gaps: the unused initial resources – one CPU core from timestamp 0 to timestamp 20 –, the resources released by the first action – one CPU core from timestamp 100 until the end of the execution – and the resources released by the second action – one CPU core from timestamp 120 until the end of the execution.
Actions may not fit in one single gap of the register; in such case, the Node Scheduler should group several gaps for fulfilling the requirements of the action. For instance, in the same example, the Action Scheduler could submit a third action to the same Node Scheduler requiring two CPU cores for 100 ms. In this case, the action should run on the gaps released by the previous actions. Since the action requires both gaps, its execution will not start until the resources of both gaps are available; i.e., timestamp 120 ms. Regarding the gap coming from the first action, the third action requires all its resources; therefore, it only creates on single gap from the gap start, timestamp 100 ms, to the start of the action execution 120. For the gap with origin the second action, the third action also requires all the resources; however, since the start times of the gap and the action execution are the same, the Node Scheduler dismisses any possible gap between the second and the third actions. Finally, the Node Scheduler registers the gap corresponding to the resources released by the third action. At the end of this third action scheduling, the register
9.2. INITIAL SCHEDULING
contains three gaps: the unused initial resources – one CPU core from timestamp 0 to timestamp 20 –, the resources from the first action idle until the third action runs – one CPU core from timestamp 100 until timestamp 120 – and the resources released by the third action – two CPU cores from timestamp 220 until the end of the execution.
To ensure that the execution uses the resources as scheduled, the Node Scheduler has to add the necessary resource dependencies. When an action employs the resources from a gap to run, the Node Scheduler adds a resource dependency from the action origin such gap to the action being scheduled. In the case of this third action, it employs resources from the gaps originated by the other two actions; therefore, the Node Scheduler adds two resource dependencies to the third action: one from the first action and one from the second one. Figure 9.2 depicts the evolution of the gap register and the dependency graph when the Node Scheduler processes these three actions. Execution Plan Gap List Dependency Graph Initial 0 50 100 150 200 Time 250 < 2 CPU cores, 0, ∞, - > First Action 0 50 100 150 200 Time 1 250
< 1 CPU core, 100, ∞, Action1 >,
< 1 CPU core, 0, ∞, - > 1 Second Action 0 50 100 150 200 Time 1 2 250
< 1 CPU core, 120, ∞, Action2 > < 1 CPU core, 100, ∞, Action1 >,
< 1 CPU core, 0, 20, - > 1 2 Third Action 0 50 100 150 200 Time 1 2 250 3
< 2 CPU cores, 220, ∞, Action3 > < 1 CPU core, 100, 120, Action1 >,
< 1 CPU core, 0, 20, - > 1 3 2
Figure 9.2: Evolution of the gap list within the Node Scheduler and the resource dependencies when scheduling three actions (Action1 and Action2 require one CPU core and Action3 requires two CPU cores) on a node with two CPU cores. Each gap is described as a 4-tuple indicating the resources contained, the start time, the end time and the origin action, respectively.
The more actions one Node Scheduler processes, the longer its gap list may become since gaps too short to host an action execution are more likely to appear. To avoid the computational cost of considering these gaps when finding the earliest moment when the resource can host the action execution, the initial scheduling policy does not consider backfilling; the Node Scheduler only maintains on the register those gaps whose end time is not defined.