Scheduling has a huge impact on the performance of the any system and is a very challenging problem, especially in the large scale deployment. We focused on different challenges associated with scheduling in different settings in this thesis.
6.1.1 Energy Efficiency and SLAs
Energy consumption in large scale cloud deployments, is not only a significant cost factor for cloud operators, but also have severe negative impact on environment. Service Level Agreements (SLA) is a way to ensure that business critical application and workloads have sufficient resources all the time. Because SLAs are so important to the business, its often the case that operators allow excessive resources to them, which leads to significant wastage of electric power. A virtualized data center mitigates the resource underutilization problem by running multiple of these applications on a single physical host.
Applications hosted via virtual machines(VM) in typical data centers, only partially use the resources allocated to it. Thus a consolidation algorithm can allow dense mapping of VMs to physical machines, but it has to be careful about maintaining SLAs of these applications. Thus a consolidation algorithm, aware of SLAs is needed. We proposed one such algorithm that uses resource usage characteristics to make effective consolidation, scale-up and scale-down decisions. The proposed algorithm relies on similarity model combining resource usage characteristics for CPU, Memory, disk I/O and network bandwidth. We compared performance of our algorithm with other well established algorithms and reported significant energy savings while minimally effecting SLAs.
While servers are primary energy consumers, network infrastructure also consumes significant elec- tric power in large scale deployment. We showed how a VM-network co-scheduling algorithm can achieve better energy efficiency of network infrastructure using knowledge of network flows. The algo- rithm based on network flow information, can put network devices in low power mode.
6.1.2 Network awareness
Powering modern consumer, scientific and enterprise applications often requires large amount of pro- cessing, which is increasingly difficult to achieve on a single machine. Thus most modern applications are build using scale out architecture. Hadoop, various graph processing systems, analytics systems, and other big data systems and scientific systems require a lot of data and messaging between the nodes in the systems. When these applications are run within virtual machines, the network performance be- tween these VMs can degrade significantly. These rise of the East-West traffic compared to North-South traffic in data center, poses challenges for network scalability and application performance. For exam- ple, a VM placement algorithm, which does not take this into account, can place VMs belonging to the same application far apart in network, which not only results in poorer application performance, but also burdens the network infrastructure.
We proposed a network aware VM consolidation method, specifically we proposed VMCluster iden- tification algorithm to identify group of VMs with high network traffic among them that can highly benefit from better placement and VMCluster placement algorithm to place the identified VMClusters in a greedy fashion to localize network traffic. We compared our consolidation method with traditional algorithms on number of different parameters on real world traces. We consistently achieved very high localization of network traffic, leading to better application performance. Stability of VM placement is very important for data center operators. We evaluated our method on number of migrations required to achieve traffic localization and repotted most of the benefits come from a very small number of migra- tions.
6.1.3 Mobile cloud
The always connected mobile device with feature rich applications, opens up an interesting opportu- nity to leverage cloud resources for mobile applications. Most of the new apps are cloud enabled, there are multitude of applications which can benefit from using cloud resources. Instead of each app doing this on its own, the mobile scheduler might be able to use cloud and mobile resources more intelligently. We proposed a learning based cloud-aware mobile scheduler, which identifies application execution that will benefit by leveraging cloud and does a transparent tunneling to applications between mobile and cloud resources. The cloud offloading decision is modeled as a learning problem using Multi-Attribute Decision Making framework. We used not only system utilization metrics but also high level features in offloading decision. Algorithm tried to predict the gain from a cloud offloading and based on this gain scheduler decides to run application locally or on cloud. We implemented our algorithm on Android Mobile operating system and evaluated for a variety of workload under different conditions.
6.1.4 MultiCloud
The cloud marketplace has different providers providing different resources at different cost. Enter- prises increasingly are using resources from multiple clouds. The lack of common standards and APIs means that for using available resources from multiple clouds, we need a framework to hide the differ- ences among clouds. We proposed one such framework, targeting big data applications for multi-cloud environment. We described scheduling challenges in multi-cloud environments and features of such a system.