Incremental State Saving (ISS)

3.1 State Saving

3.1.3 Incremental State Saving (ISS)

If the above mentioned techniques address the problem of state log/restore by

tuning the checkpointing interval to minimize the WCT required for a rollback operation, they do not take into account that, even if seldom-executed, a check-

pointing operation can be onerous due to the size of the the state Si being saved. Additionally, if S_i is large, but only a small portion of it was modified since the

last checkpoint, there is a large amount of wasted time spent copying redundant information.

The goal of Incremental State Saving (ISS) is to limit checkpointing overhead by reducing its execution time, and by limiting at the same time the amount of

memory used for a single log.

The first ISS approach has been published in [10], and is based on the fact

that each event e ∈ E is augmented with the following information:

• The value of modified state variables after the event is processed;

• The value of modified state variables before the event is processed; • The ST T_gen when the event is generated;

• The ST Texe when the event must be executed, with Texe> Tgen;

queue, where (for each LP) a list of modifications of the states is stored. The

nodes of this queue are linked to the events that caused the updates. Upon the receipt of a straggler message associated with timestamp T_straggler, the rollback

operation is carried on in a way different from the previously presented one. In particular, all the events of the rolling back LP i such that T_straggler ≤ T_exe ≤

LV Ti are scanned, and the corresponding state variables that where modified are put back in place, taking care that the same variable is updated only once4.

Of course, this technique is not transparent at all, as the model programmer must be aware of the concept of rollback and state saving, and has to access the

kernel’s data structures to make copies of the state variables before modifying them. This is necessary, because the simulation kernel is not aware of where the

simulation state is stored, and of which parts are being updated by an event. The work in [141] makes an additional step towards transparency, by re-

lieving the application-model writer from directly modifying simulation kernel’s data structure. In particular, it proposes to implement incremental checkpoint-

ing in a transparent fashion by relying on Object Oriented overloading mecha- nism. This choice narrows its applicability to OO programming languages (like,

e.g., C++ or Java). The basic idea is based on two essential points:

• all (user-defined) functions which process events (i.e., the event handlers)

must be redefined by means of overloading;

• state variables must be encapsulated within classes defined by the simula-

tion kernel.

Encapsulation allows the simulation kernel to discriminate state variables

Although this might seem an incorrect algorithm for state restore, we emphasize that the original proposal in [10] was targeted at a specific scenario, namely digital logic simulation, where additional property guarantee that the final simulation state is restored correctly. We refer to the original paper [10] for a complete discussion of the algorithm, and to Chapter 4 for a more general approach to incremental restore.

from other ones which might be used to store only temporary results used for

computation. The application must therefore notify the kernel of which are the state variables by using the special signature State<class T>, which wraps the

class T that the system will handle as part of the state. The programmer will be allowed to manipulate wrapped objects by using overloaded methods, which will

make a copy of the accessed data buffers before the actual update is performed. In this scheme, the user must be aware of the notion of rollback and of

the fact that the simulation kernel will perform incremental checkpoints, yet differently from [10] the operation is performed by relying on a service exposed

by the simulation kernel which gives a certain degree of freedom in the definition of the simulation state.

The proposal in [179], targeted at x86 computing architectures, supports transparent incremental state saving by relying on software instrumentation.

The simulation model’s assembly code is parsed and, whenever an instruc- tion that updates a memory region is found, a call to an ad-hoc module is

prepended, which generates a copy of the old memory’s value before updating it. Whenever a rollback operation is performed, the chain of memory updates

is scanned backwards, in order to realign the content of the simulation state to time T_rollback. This technique is oriented to the programmer, so that she

does not have to alter the original simulation code because the instrumentation phase automatically detects which are the possible assembly instructions that

will alter memory content. Nevertheless, the approach used in [179] suffers from a performance sub-optimization. Let us consider the following code snippet:

1 for(i = 0; i < MAX; i++) {

2 state−>array[i]++;

3 }

increment some statistics in the model’s state. Yet, each iteration of the loop

would entail two memory updates, one to variable i and one to state variable

state->array[i]. The proposed solution would call the ad-hoc module twice per iteration, and would create a node in the state chain for each modification of the state->array[i] variable, given that its granularity is word-based. There-

fore, the single event’s execution delay is increased depending on the memory- access pattern of the simulation model which, even in simple examples like the

proposed one, could be non-negligible.

The work in [125], which will be thoroughly discussed in Chapter 4, tackles this issue in the case of dynamically-scattered simulation states by relying on a dirty bitmap. Essentially, it relies on assembly-code instrumentation as well,

but given that an ad-hoc memory manager is (transparently) interposed between the simulation model and the underlying system memory manager, each memory

update is materialized by simply setting one bit (corresponding to the touched memory area) to 1, stating that the memory area was updated. In this way, the

checkpointing operation is carried out periodically, saving only the areas that were actually updated since the last checkpointing operation. This allows the

simulation kernel to rely on any of the aforementioned optimizations regarding the checkpointing interval χ, as it has been shown in [168]. The latter approach

relies on an integral function for fine-tuning the checkpointing parameters, en- forcing as well stability of the decision towards fluctuations in the execution

dynamics.

In Table 3.1 we show a comparative summary of all the state saving techniques presented so far, which highlights which properties are provided by each

SS Technique PSS ASS Scattered Sim ulation State Incremen talit y Arbitrary Gran ularit y T ransparency Instrumen tation-Based Analytic Mo del T emp oral Measuremen ts Heuristics Autonomicit y Stabilit y to Fluctuations [76] [77] [13] X X [121] X X X X X [140] X X X X [152] X X X X [45] X X X X X [133] X X X X [10] X X X [145] X X X [141] X X [179] X X X X X X X [125] X X X X X X X X [134] X X X X [168] X X X X X X X X X X X X

Advancements by the thesis

As it will be discussed in Chapter 4 we will explicitly allow the programmer

to rely on dynamically-allocated memory to scatter the simulation state, and to make it grow/shrink depending on the actual model’s execution dynamics.

This will be done transparently at linking time by relying on specific malloc wrappers, which will redirect calls to the standard memory-allocation library

to our specific memory manager. The memory manager will work at chunk granularity, i.e. upon simulation startup a large buffer (for each LP) will be

pre-allocated, for serving subsequent memory request by each LP.

Furthermore, we will transparently integrate (via static software instrumentation) a memory-update tracking subsystem, which will allow the simulation

kernel to know precisely which portions of the simulation states have been updated since the last log operations. This information will be kept as small as

possible, to cope with memory usage, by relying on a dirty bitmap, i.e. a com- pressed data structure where each bit tells whether a specific chunk has been

modified.

By relying on this information, the simulation kernel will be able to transpar-

ently execute Incremental State Saving, thus reducing the overhead required for taking a simulation snapshot and additionally minimizing the memory footprint

related to state saving.

In document Techniques for Transparent Parallelization of Discrete Event Simulation Models (Page 104-109)