Chapter Summary - Adaptive Resource Relocation in Virtualized Heterogeneous Clusters

of the resources to a certain job / user / group. By default Maui uses easy backfill to schedule jobs across the nodes. Like any scheduler, Maui requires a resource manager on each compute node that keeps the Maui informed about the availability and various statistics (like memory utilization, CPU load etc) of the node. Maui comes with Torque, which is an open source resource manager.

2.5.2 The N1 Grid Engine

Another popular job scheduler is Oracle N1 Grid Engine [43] (formally Sun N1 Grid Engine). Like the Maui scheduler, the N1 Grid Engine also makes use of backfill scheduling and priority scheduling.

2.6 Chapter Summary

There has been a lot of research in the area of cluster scheduling; however most of the research is based on the concepts of homogeneous compute clusters. With the growing popularity of relatively cheap multi-core solutions, compute farms are increasingly becoming heterogeneous. For this, researchers have started efforts in the area of heterogeneous cluster scheduling. Most of the work in this area is based on the off-line profiling of the applications and profile estimates based on CPU frequency only. As we show in coming sections, this results in inaccurate prediction for parallel applications. Heterogeneous cluster scheduling research is yet to find its way into the production level solutions like Maui and N1 Grid Engine. This is mainly due to the overheads of profiling and the offline approach. In essence, there is a lot of work that needs to be done to bring the heterogeneous cluster scheduling research to the mainstream cluster schedulers.

Chapter

3

Virtualization in Cluster

Computing

Virtualization technology is receiving widespread adoption mainly due to the potential benefits of server consolidation and isolation, flexibility, security and fault tolerance. It has also generated interest in the high performance computing (HPC) community, mainly for the reasons for high availability, fault tolerance, cluster partitioning and balancing out conflicting user requirements.

Virtualization allows a cluster to run different operating system images which allow both legacy codes and new functionality to co-exist. One can easily create a virtualized cluster on the fly in the case a user requires an application-specific and customized operating system. Similarly, hardware maintenance and upgrades are possible without disrupting the services to users/customers. If a hardware failure is detected (e.g. a network card or a RAM bank failure), the hosted virtualized operating systems can be seamlessly migrated to the healthy hardware. This keeps the downtime of a HPC compute farm to a minimum, which is often a critical requirement.

This chapter discusses the architecture and use of virtualization technology in HPC environments.

3.1 Overview of Virtualization

Virtualization has existed for over 40 years, when IBM developed virtualization support for its mainframe namely IBM System/360 in the late 1960s [58, 12]. The IBM System/360 was implemented to logically partition mainframe computers into separate virtual machines. These partitions allowed mainframes to multitask. Since mainframes were expensive resources at the time, they were designed for

partitioning as a way to fully leverage the investment. Virtualization was effectively abandoned during the 1980s when client-server applications and inexpensive x86 servers and desktops led to distributed computing. The broad adoption of MS Windows and the emergence of Linux as server operating systems in the 1990s established x86 servers as the industry standard. The growth in x86 server and desktop deployments led to new IT infrastructure and operational challenges. These challenges include [3]:

• Low Infrastructure Utilization. Typical x86 server deployments achieve an average utilization of only 10% to 15% of total capacity, according to International Data Corporation (IDC), a market research firm. However, organizations typically run one application per server to avoid the risk of vulnerabilities in one application affecting the availability of another application on the same server.

• Increasing Physical Infrastructure Costs. The operational costs to support growing physical infrastructure have steadily increased. Most computing infrastructure must remain operational at all times, resulting in power consumption, cooling and facilities costs that do not vary with utilization levels.

• Increasing IT Management Costs. As computing environments become more complex, the level of specialized education and experience required for infrastructure management personnel and the associated costs of such personnel have increased. Organizations spend disproportionate time and resources on manual tasks associated with server maintenance, and thus require more personnel to complete these tasks.

• Insufficient Failover and Disaster Protection. Organizations are increasingly affected by the downtime of critical server applications and inaccessibility of critical end user desktops. The threat of security attacks, natural disasters, health pandemics and terrorism has elevated the importance of business continuity planning for both desktops and servers.

• High Maintenance end-user desktops. Managing and securing enterprise desktops present numerous challenges. Controlling a distributed desktop environment and enforcing management, access and security policies without impairing users ability to work effectively is complex and expensive. Numerous patches and upgrades must be continually applied to desktop environments to eliminate security vulnerabilities.

The desire to reduce the operational costs and high availability has resulted in resugence in the virtualization technology especially in x86 platforms.

In document Adaptive Resource Relocation in Virtualized Heterogeneous Clusters (Page 45-49)