6. Context-Aware Scheduling in Distributed Computing Systems
6.3.3. Scheduling with Cache Lists
So far, consumers in the Tasklet system requested a resource from the central broker which performed the scheduling decision and responded to each resource request individually (see Figure 6.13). Thus, the delay before a task could get executed included at least one round trip time between the consumer and the broker. The central scheduler has the advantage that resource consumers and providers have a well-known registry in the system. Due to its prominent role, the central scheduler has global knowledge over all entities in the system and knows about the availability and performance of the computing resources. We propose a scheduling architecture that eliminates the need for a resource request to the broker. Instead, the broker asynchronously shares its knowledge about connected providers with the consumers in form of cached resource lists, or cache lists for short. The management of the resource, which includes the registration
6.3. Decentralized Scheduling 122 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.160 134.155.23.111 134.155.23.132 134.155.23.99 134.155.23.10 Broker Consumer Provider Provider/Consumer 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 Cache List Cache List 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.160 134.155.23.111 134.155.23.132 134.155.23.99 134.155.23.10 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 Size of Cache List
(short vs. long)
# options low overhead
Composition of Cache Lists (assorted vs. random)
variety simplicity
Update Intervals (high vs. low frequency)
134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 134.155.23.190 134.155.23.194 134.155.23.178 134.155.23.192 134.155.23.14 up-to-dateness low overhead Task
Result
Figure 6.14.: Scheduling with cache lists. The broker periodically disseminates lists with provider information to all consumers. Instead of sending resource requests to the broker for each Tasklet, consumers cache these lists and locally select a suitable provider for the Tasklet execution. Trade-offs between performance and overhead are shown on the right.
and monitoring of resource providers is still facilitated by the central broker which maintains a global view of the resource pool.
The broker does not respond to each individual resource request from a consumer but rather periodically sends resource lists to the consumers. The consumers cache these resource lists and use them for the selection of suitable providers for Tasklet executions. Instead of sending a resource request to the broker, consumers look up a provider in the cache list and directly send a Tasklet execution request to the selected provider. Figure 6.14 (left) shows the decentralized scheduling with cache lists. While this approach avoids the per-Tasklet communication between consumers and brokers, it also carries some risks. First, the cached lists can easily become outdated as providers might leave the system at any time. Second, if the cache lists are too small, consumers might not find an appropriate provider. Third, the distribution of cache lists might introduce a significant overhead. As a consequence, several trade-offs between overhead and performance emerge (see Figure 6.14 (right)). In the following, we discuss the parameters of this architecture that must be adjusted carefully to guarantee a performant yet lightweight decentralized scheduling.
Size of Cache Lists: Brokers maintain a complete and up-to-date view on their resource pool. To allow for local scheduling decisions, they share this knowledge with the consumers by periodically distributing cache lists. This might result in a
6.3. Decentralized Scheduling 123
large amount of data transfer between the broker and the consumers. To reduce the communication overhead that is used for resource list propagation, consumers only retrieve a subset of this list. As each consumer gets a different share of the complete list, the load can be distributed equally across all providers. However, if the lists become too small, consumers might not be able to find suitable providers and have to send a resource request to the broker.
Composition of Cache Lists: Brokers might store multiple properties of each resource provider. Providers may vary in their availability, their hardware, and also in their connectivity. These properties can be used in a context-aware scheduling system. As the broker only forwards a share of the overall provider list to each consumer, the composition of the list might have an impact on the performance of the scheduling decisions. Each Tasklet might have several requirements for execution that only some of the providers fulfill. The composition of the cache lists can be managed in two ways. Providers can be either picked randomly or the mixture of providers can be balanced based on their properties. While the first approach reduces the complexity of creating provider lists, the second approach guarantees a fair propagation of providers to the consumers, which might be more likely to find a suitable provider for the execution of their Tasklets.
Update Intervals: Due to the fact that providers might leave the system at any time, the cached resource lists of the consumers eventually become outdated. Depending on the degree of dynamism, the speed of this process varies. As a result of dynamism, consumers are unable to reach the selected providers. To keep the cache lists up-to-date, brokers periodically send updates to the consumers. The interval of these updates represents another trade-off between up-to-dateness and overhead caused by the propagation. Instead of using fixed intervals, the brokers can monitor the degree of dynamism in the system and adapt the time interval between two updates accordingly. While this approach allows for a context-aware adaptation of the update intervals, the monitoring introduces further overhead.