Additional Components of PDES - Parallel Discrete Event Simulation (PDES)

2.3 Parallel Discrete Event Simulation (PDES)

2.3.2 Additional Components of PDES

By the above discussion, it is clear that PDES simulation kernels cannot rely solely on the elements described in Section 2.2.1. In fact, while the conservative approach might require more than one event queue (if implementing, e.g., the

solution in [25] that we discussed before), Time-Warp-based simulation kernels must store all pending events, all processed events (in order to support the

re-execution of parts of the simulation trajectory), and simulation states.

In Figure 2.7 we show the essential building blocks for a reference architecture

supporting the optimistic synchronization protocol. From the picture we can see that at least the following additional data structures and services must be

supported by an optimistic simulation kernel.

Input and Output Queues In addition to the (input) event queue, an output

which have been sent during the execution of other events. This information is

mandatory, as during the rollback operation antimessages must be sent to undo the effects of inconsistent operations at remote LPs. To simplify the rollback

operation, many implementations [67, 129, 22, 32] rely at least on a per-LP output queue, so that when scanning for antimessages to be sent, the check can

be performed only on the send time, without checking (on a per-message basis) the LP identification code as well.

Messaging Subsystem To decouple from the application model the fact that

LPs can be stored either locally or on a remote kernel instance, a messaging subsystem takes care of messages “routing”. In this way, the simulation model can

simply rely on a uniform API for events’ scheduling. It is the simulation kernel that determines where (and how) the message must be delivered. Additionally,

the messaging subsystem can internally handle the output queue, so that the execution of a rollback operation can be decoupled from the antimessage-sending

procedure.

State Queue & State Management Subsystem In case of a simulation

kernel implementing the rollback operation by means of state saving & restore,

state queue is the fundamental data structured used to recover an LP’s consistent execution whenever a causal inconsistency is detected.

The state queue is handled by the state management subsystem, the role of

which is related to:

1) maintaining a timestamp-ordered list of states, adding new nodes when a new snapshot is required by the system;

has to be restored from the log, or executing reverse events up to the

rollback time);

3) performing coasting forward operations (i.e., fictitious reprocessing of in-

termediate events in between the restored log and the point of the causality violation);

4) performing fossil-collection operations (i.e., memory recovery, by getting

rid of all the events and state logs which belong to an already-committed portion of the simulation).

GVT Subsystem The Global Virtual Time (GVT) subsystem accesses the

message queues and the messaging subsystem in order to periodically perform

the global reduction aimed at computing the new value for the simulation’s commitment horizon. In addition, this subsystem cares about termination de-

tection, by either checking whether the new GVT oversteps a given predeter- mined value, or by verifying some (global) predicate (evaluated over committed state snapshots), which tells whether the conditions for the termination of the

model execution are met. Finally, this subsystem is also in charge of performing the so-called fossil-collection procedure, aimed at recovering memory buffers

currently keeping obsolete messages and logs, namely those related to the newly- committed portion of the computation.

Event Scheduler A central point relates to the CPU-scheduling approach used to determine which LP, among the ones hosted by a given simulation-

kernel instance, must take control for actual event processing activities. As discussed, the common choice is represented by the Lowest-Timestamp-First

with the minimum timestamp, compared to pending next events of the other LPs

hosted by the same kernel. LTF has the advantage of avoiding the generation of causality violations across the LPs hosted by the same kernel instance. This

is because these LPs are dispatched in a similar way to what would happen on top of a sequential simulation engine, which imposes a timestamp-ordered

sequence of CPU-schedule operations for all the events. Hence, rollbacks can be generated only in relation to events scheduled between LPs hosted by different

kernels, which contributes to the reduction of the amount of rollbacks.

Different design/implementation variants for LTF exist, along with the basic (stateless) approach in [96] which exhibits O(n) time complexity and relies on

traversing pending next events across the input queues of all the LPs. For example, an O(1) statefull approach has been recently proposed [147], which is based on reflecting variations of the priority (i.e., of the next-event timestamp)

of the LPs into the CPU-scheduler state—which is done in constant time—and in determining the LP with the highest priority (again in constant time) by

running a query on the current CPU-scheduler state.

Random-Number Generators As discussed before, both incarnations of

optimistic synchronization (i.e., state-saving-based and reverse-computation-

based) require that (pseudo) random number generation is carried piece-wise deterministically. To this end, simulation kernel must provide random number

generators which are aware of the rollback operations, and can therefore rollback their internal state as well. This can be done either by storing the internal seed

along with a state snapshot, or by implementing a generator which is associated with a reverse function, able to undo the effect of a generation on the internal

In document Techniques for Transparent Parallelization of Discrete Event Simulation Models (Page 73-77)