• No results found

5.3 World State Updating

5.3.5 Update Process Timing

The state retrieval and action retrieval operations have to be timed right so all nodes synchronize updating their world state. Suppose the updated state of time ti has just been calculated and nodes are preparing for the next update to time ti+1. After

116 Chapter 5 A Virtual World Storage

Current Time ti ti+lmax ti+2*lmax ti+3*lmax ti+4*lmax ti+1 ti+1+lmax ti+1+2*lmax ti+1+3*lmax ti+1+4*lmax ti+2 ti+2+lmax = ti+5*lmax = ti+1+5*lmax

State Retrieval ts

i tsi+1

Action Retrieval ta

i tai+1 tai+2

Stored State ti-1 ti ti+1

Stored Actions ]ti, ti+1] ]ti+1, ti+2] ]ti+2, ti+3]

Figure 5.7: Timing of retrieve operations in the update process

a node a updated its state to ti, it has to retrieve the update area Ac+d from the

storage to obtain the set of objects able to influence Ac. However, before it can do this it has to be sure the other nodes also successfully updated their state to time

ti. We denote this time tsi with ti < tsi < ti+1and explain its calculation later. At tsi

nodes start to retrieve the state of time ti.

When this retrieve returns, nodes have to retrieve all actions of players in the re- trieved update area Ac+d between ti and ti+1. Before nodes can request all actions

including ti+1, they have to make sure these actions have been stored successfully.

At latest, a player could generate a relevant action at ti+1. After generating, this

action still needs some time until all of its replicas are stored at the different nodes. Since a player generating actions always stores these actions on the same nodes, the overlay will have established shortcut connections to these nodes. Storing a replica just takes the latency of one message on the link layer. We define lmax to be

an assumed maximum link layer message latency. All assumptions about whether nodes have completed phases of the update process are based on this latency. Most of the time, this assumption will hold. If it does not and a message is late, a node will report wrong data. Replication corrects for this and ensures the updating works correctly with a high probability.

Figure 5.7 shows the timing of the update process based on the assumed lmax. To calculate the state at time ti+1, a node needs all actions of players generated until this time. As lmax is the maximum duration of store operations, all actions up to time ti+1 should be stored at tai+1 = ti+1 + lmax. At tai+1, nodes retrieve the actions of players in their update area between time ti and ti+1. We assume 2∗ lmax as bound for the duration of the retrieve operation as it consists of one retrieve message to the storing node and one result message back to the retrieving node. Consequently, 2∗ lmax is the timeout for action retrieves. Therefore, at time

ti+1+3∗lmaxall nodes should have completed their retrieve operations for all actions up to ti+ 1. Therefore, they can calculate the state in their maintenance area Ac at time ti+1. They actually perform that calculation upon retrieving the last missing

5.3 World State Updating 117

player actions so the state updates at the different nodes will happen somewhere between ti+1+ lmax and ti+1+ 3∗ lmax.

At ts

i+1 = ti+1+ 3∗ lmax, nodes can be sure all other nodes finished their update to

time ti+1. Thus, nodes can start retrieving the state at ti+1to prepare for calculating the state at ti+2and the update cycle starts again. The cycle length is I = 5∗ lmax, yielding ti+1− ti = 5∗ lmax. Consequently, the aforementioned ts

i can be calculated

from ts

i = ti+ 3∗ lmax.

The operations State Retrieval and Action Retrieval shown in figure 5.7 run in- terchangeably at the given times. Retrieving actions starts as soon as actions are guaranteed to be stored as shown in row Stored Actions. When this retrieval is com- pleted on all nodes and the state is updated as shown in row Stored State, retrieving this updated state starts immediately. This gives the state retrieval operation a maximum time of 3∗ lmax to complete. Since nodes permanently request the same

area, they will have shortcut connections to the multicast root nodes of the area in the different zones after the first retrieve operation.

In total, the areacast probably needs two to three messages to reach all nodes in the area since area sizes in the destination zone should be similar to the area sizes in the source zone. The result of the retrieve will be returned in one hop using a shortcut. Therefore, 3∗ lmax seems to be sensible timeout for area retrieve operations as three

messages in a row should rarely take the maximum latency. If timing bounds get too tight and a lot of retrieve operations fail to obtain the necessary replica votes,

lmax can always be adjusted accordingly. This also increases the cycle length of the

update process.

As shown in figure 5.7, nodes are only guaranteed to store the state of time tibetween time ti + 3∗ lmax and time ti+1+ lmax. When the first node completed retrieving all player actions, it will update its state to time ti+1. The earliest theoretically possible time for this is right after the start of requesting actions ti+1+ lmax. After that, requesting an area might yield an inconsistent state containing objects that have already been updated and objects that have not been updated yet. Therefore, players trying to check whether they are seeing the correct state of the world can only do this during the state retrieval phase when the world state should be consistent.

Clock Skew and Drift Compensation

The clocks of the different nodes are only loosely synchronized within the bounds obtainable by NTP. When a node reads the current time to program a timer to start

118 Chapter 5 A Virtual World Storage

the action or state retrieve at times ta

i or tsi, the timer will fire around these times

on the local clock. However, this time will slightly deviate from the true global time according to NTP accuracy. It is not a severe problem if the retrieval process starts a little late as still the other nodes should have completed their update and store operations. However, the retrieval process should not start too early to give other nodes the chance to complete their operations within the specified bounds.

This can easily be compensated by adding the maximum deviation td to the true global time to the programmed time. This way a node will start operations td later on average and 2∗ td later in the worst case if it runs slower but was compensated for running faster. In this worst case, the node will itself have less time than the specified bounds to complete its operations. However, retrieve operations have a 2∗ lmax and 3∗ lmax bound which are less likely to be exceeded as multiple message need to be late. On the other hand, the actions store operation has got a lmax bound that could be violated more often as only one store message has to be late. Therefore, we opted to ensure retrieve operations never run too early.

If we also assumed a minimum latency for messages, this minimum latency might have been sufficient to compensate for operations starting too early. However, it would be harder to decide on a minimum latency as messages could be really fast on a local network. Actions are also stored on the local node meaning no communication is necessary for the local store. Consequently, we did not use a lower bound for message latency higher than the implicit zero bound.