Problem Statement and Research Challenges

1 INTRODUCTION

1.2 Problem Statement and Research Challenges

Wide-area task parallel simulations and executions are predominately performed on loosely coupled computing infrastructures managed by software systems such as grid and metacomputing middleware. The advantages of these platforms are partly derived from the EP-style of computation and also follow from the advantages of the

master/worker paradigm. PDES codes, however, have mainly been relegated to execute on traditional tightly coupled distributed computing infrastructures. Although these HPC systems offer the highest performance, there can be issues with deployment and readily available access.

The allure of harvesting computing cycles afforded by these metacomputing systems and desktop grids in particular for PDES computations is intriguing. The master/worker paradigm as described in the previous section offers capabilities such as reducing the burden on the users to run simulations, user-directed load balancing, system- level fault tolerance, heterogeneous machine support, the ability to share computing resources, and dynamic resource allocation and de-allocation, e.g., to add or remove processors during an execution. Paramount to all these advantages is the ability to utilize idle processor cycles that would otherwise be wasted.

PDES computations, however, are constrained further than typical task parallel applications due to LP and state management, time synchronization and message passing. By utilizing a master/worker driven metacomputing paradigm and addressing the special requirements of PDES codes, additional computational throughput capacity can be attained while retaining all of the associated benefits of a master/worker approach. The cost of this flexibility is overall performance. It is clear that PDES performance under a

master/worker metacomputing environment will never match that of a customized execution on a tightly-coupled HPC system. Such a performance gap exists between master/worker and conventional PDES systems on HPC platforms due to overheads inherent to the master/worker-style of work distribution. Under a master/worker PDES system, the state of each work unit must be stored on a master service for consistency and fault tolerance purposes. If a client with a work unit containing a portion of the

simulation fails, the entire simulation will fail if the master service has no record of the last valid states. Therefore, each work unit lease incurs additional overhead not found in traditional PDES systems for checking out and checking in state variables that are part of the leased work unit to the worker. Moreover, messages cannot be directly sent in a master/worker system. Messages generated must be buffered on a master service temporarily before the proper work unit can download them when necessary. This indirect delivery of messages introduces additional overhead and message latency and degrading overall performance of the simulation. Finally, since the worker pool is heterogeneous all data exchanged must be serialized before transmission over the network. Additionally, this data must be de-serialized on the worker before it can be used. This data packing and unpacking introduces additional overhead typically not found in traditional PDES systems.

Although a performance reduction from conventional PDES systems is expected, a software infrastructure that provides additional computing throughput allows for other advantages such as more resilient and robust executions. Challenges for creating such an infrastructure are described next.

1.2.1 Portability

Most conventional HPC systems feature complete vertical homogeneity from the hardware to the operating system. Thus software on traditional distributed computing platforms seldom have to deal with portability issues. However, in public resource and desktop grid computing infrastructures, a variety of hardware architectures and operating systems must be accounted for in order to take full advantage of the available processing power in the worker pool. For task parallel simulations, this involves compiling the application for the target platform and encoding results in a platform-independent text or binary format. For PDES, the issue becomes more complex as the program code must be compiled for each possible target platform along with simulation state and messages that must be packed in agreed protocols or in a platform-independent fashion.

1.2.2 Node Volatility

Under volunteer and desktop grid computing platforms, it is expected that clients may drop from the worker pool. Measures are directly incorporated to deal with clients that cannot return results on time, fail to receive or send data, or provide incorrect results. However, the failure of one leased work unit is not detrimental to the overall progress of the entire project for typical volunteer projects. On the other hand, a failed partition under PDES can result in complete failure of the application. In traditional PDES, it is uncommon for a simulation to cope with node failure outside of a system that performs periodic checkpointing with a restart mechanism. PDES, under a volatile master/worker system, must consider volatility when partitions upon which other partiions depend are leased to a client that may fail. Controls must be implemented to ensure forward progress

in the presence of failed clients that never return a result or are too slow or may incur errors during updates.

1.2.3 Fault Tolerance

Related to node volatility is the issue of fault tolerance. As described earlier, under master/worker systems, fault tolerance is much easier to accomplish as failed workers are not systematically detrimental to the ongoing execution. However, a failure on the master service end can result in a complete execution shutdown due to a single point of failure. Additional fault tolerance protocols on the master portion must be considered when addressing PDES on these unreliable metacomputing frameworks as not only simulation metadata control information is stored under finite resources, but also simulation state information and messages.

1.2.4 Centralized Bottlenecks

In a typical modern PDES system, there are no centralized bottlenecks as time synchronization can be done asynchronously and in a decentralized fashion with regard to both conservative and optimistic mechanisms. Under a master/worker paradigm, by nature, there exists a master that exhibits some form of a centralized bottleneck. For PDES this is especially problematic, as the amount of data that must be moved is much larger than those found in traditional task parallel applications.

1.2.5 Bandwidth and Latency Concerns

With a large amount of data that is transmitted over the course of a PDES, bandwidth and latency becomes a concern when operating under a desktop grid infrastructure. New protocols and policies must be devised for both conservative and

optimistic synchronization to reduce congestion, preserve useful computation and avail work unit locality on the workers to the distributed simulation.

1.2.6 Load Balancing

Load balancing in traditional PDES is a difficult problem and is not commonly found in most monolithic PDES codes and even many run-time infrastructures. Systems that do have load balancing capabilities often use pre-computation or sequential runtime data first to determine event and computational density in order to partition and allocate resources effectively during a distributed run [79]. Dynamic load balancing schemes are most often specialized for their application domain and may not be portable across all PDES codes [80], while generic dynamic load balancing schemes do not support dynamic resources and heterogeneity among nodes [81, 82]. In contrast to traditional PDES, load balancing in PDES under a master/worker system must be generic enough to support all PDES codes applicable to this paradigm as well as support dynamic resources along with non-homogeneous workers. Load balancing with respect to master/worker systems can be split into two parts. First, the clients themselves are load balanced through a

combination of the master matching available work to idle clients including matching software requirements with the available hardware along with clients themselves

disengaging from the execution if they are no longer available for computation. Secondly, load balancing can be applied on the master side as well, where state and message storage for LPs can be migrated between high and low load servers. Additionally, new resources whether they are master services or workers can be immediately integrated into the running simulation.

In document Master/worker parallel discrete event simulation (Page 47-52)