The primary function of the monitoring module is to observe and record the actual resource requirements of each task periodically which can be used by both prediction and scheduling modules. The prediction module can use this recorded data to improve the prediction function further. The scheduling module can benefit by updating its resource utilisation profile in two cases: (i) if the maximum resource consumed by a task is higher than the amount given by predictor module, then the maximum resource requirement of the task can be updated and hence the utilisation profile; and (ii) if the task is finished, then the utilisation profile of the machine used by the scheduling module can also be updated in the cluster.
Another function of this module is to decide the eviction of some task(s) if the resource requirements of the tasks running on the machine is higher than its maximum capacity. The selected tasks would be marked for eviction, their execution stopped and sent back to the Queue of Tasks in order to reduce the level of utilisation of the machine. Note that any eviction policy has to reach a decision with a knowledge local to each machine. There are indeed several strategies that one could implement to select candidate tasks for eviction. The basic strategy, implemented by the providers of the trace consists in choosing tasks for eviction based on their priority level. The reason for this is that the system must guarantee a high quality of service for tasks with high priorities. We explore two alternative implementations of the eviction policy that simply relies on different orderings of the candidate tasks. Note that these eviction policies are triggered by resource overload on the machines. It is thus possible for a task to be repeatedly evicted until the machine frees up some resources. Algorithm 9 thus details the implementation of line 18 of the simulation Algorithm 10.
We study the following three orderings that could be used on Line 2: minPrio
5. ONLINECONSOLIDATION WITH
UNCERTAINTASKSIZES 5.4 Monitoring Module
by increasing priority. Tasks with smaller priorities are favored for eviction. minRunningTime
The tasks are ranked by increasing running time. Tasks that have been running for the least amount of time are favored. The underlying idea is that one should avoid evicting tasks that have been running for a long period of time. Eviction tasks that have been running for long periods of time would impact negatively the average waiting time.
minNumTask
In this case the candidates are ordered by decreasing actual resource require- ments. The idea here is to try to favor the eviction of tasks requiring a large amount of resource, so that we locally minimise the number of evictions. Algorithm 9 simply sorts the currently running tasks on the machine and greedily mark for eviction tasks until the machine’s capacity is not exceeded anymore. All these strategies will be compared in Section 5.5.
To conduct experiments in this complex scheduling environment, we developed an event-driven simulation framework. This framework handles and maintains a collec- tion of ordered events related to tasks along a time line. At any point in the simulation, the collection of events is carefully handled in such a way that it remains ordered by increasing time stamp. The simulation framework that is used to carry experiments is described in pseudo-code in Algorithm 10. The simulation is bootstrapped on Line 1 by initializing the queue of events E with the first task arrival selected from the set of arrival events. We next define the set of events dynamically generated as the simulation unfolds according to Algorithm 10:
• Arrival. This event simulates a task submitted for scheduling. Upon arrival, the scheduler is called to assign the task to a machine and decide on the starting time (Line 5). This step also includes a call to the prediction module that returns an estimate on the peak values that the task is expected to reach. The scheduler guarantees to find a pair hxt, yti with xt expressing to which machine the task
has been assigned and yt the starting time for the task to be processed on that
machine. We then add, on Line 6, the Start event associated with the current task at time yton the simulated time line; meaning that task t is due to start when
the simulation reads that event.
On Lines 7 to 9 we feed to the simulated time line E with as many - evenly spaced in time - resource Update as we could retrieve from the original data contained in Google trace. These events are capturing the actual resource
consumption of the task at hand. It is recalled that dtis the duration of the task t
and ktis the number of records of resource usages captured by Google traces.
Lastly, on Line 10 we place the Arrival event associated to the next task coming into the simulation. From a data loading point of view, this is done lazily by only adding a task arrival to the queue when the current one has just been handled.
• Start. This event simulates the task’s execution starting on the machine to which it has been assigned(Line 12). From this point and on, the task is running on the machine consuming some resources and thus it is eligible for eviction if the machine would overload. Moreover, since the task duration is considered known in our setting, we simply add, on Line 13, a Finish event further down the simulated time line E at time yt+ dt.
• Finish. Once the simulation reaches a Finish event, the associated task is considered completed and thus removed from the machine freeing any resource that the task was consuming in the process. This is implemented on Line 15. • Update. The update event, on Line 17, simulates the variation on the task’s
requirements on both CPU and RAM resources. As stated previously, in most cases, the actual requirements are less than the amount of resource that were pro- visioned for the task. Nonetheless, it might happen that the actual consumption exceeds the provisioned space, as checked on line 18. In that case, the machine running the task might find itself in a saturated state, which triggers an eviction of the task itself or, any other task running concurrently on the machine. In ad- dition, since evictions are triggered when a task requires more resources than forecasted, we update the predicted peak required when it was underestimated. In that way, we continuously learn about peak requirements of tasks.
For each tasks marked for eviction, we generate, on Line 20, an Evict event happening an arbitrarily small delay in milliseconds α further on the simulated time-line. Several eviction policies are discussed in Section 5.4.
• Evict. This event simply removes any other events related to the evicted task, on Line 23 and emits a new Arrival event on Line 22. This new Arrival event is set to happen an arbitrarily small (α) number of ms later in the simula- tion.
Using the definition of these events, we build a simulated time line of events occurring in a logical order imposed by the time of occurrence of an event.
5. ONLINECONSOLIDATION WITH