Update Process Timing - World State Updating

5.3 World State Updating

5.3.5 Update Process Timing

The state retrieval and action retrieval operations have to be timed right so all nodes synchronize updating their world state. Suppose the updated state of time t_i has just been calculated and nodes are preparing for the next update to time t_i+1. After

116 Chapter 5 A Virtual World Storage

Current Time ti ti+lmax ti+2*lmax ti+3*lmax ti+4*lmax ti+1 ti+1+lmax ti+1+2*lmax ti+1+3*lmax ti+1+4*lmax ti+2 ti+2+lmax = ti+5*lmax = ti+1+5*lmax

State Retrieval ts

i tsi+1

Action Retrieval ta

i tai+1 tai+2

Stored State ti-1 ti ti+1

Stored Actions ]ti, ti+1] ]ti+1, ti+2] ]ti+2, ti+3]

Figure 5.7: Timing of retrieve operations in the update process

a node a updated its state to ti, it has to retrieve the update area Ac+d from the

storage to obtain the set of objects able to inﬂuence Ac. However, before it can do this it has to be sure the other nodes also successfully updated their state to time

ti. We denote this time tsi with ti < tsi < ti+1and explain its calculation later. At tsi

nodes start to retrieve the state of time ti.

When this retrieve returns, nodes have to retrieve all actions of players in the re- trieved update area Ac+d between ti and ti+1. Before nodes can request all actions

including ti+1, they have to make sure these actions have been stored successfully.

At latest, a player could generate a relevant action at ti+1. After generating, this

action still needs some time until all of its replicas are stored at the diﬀerent nodes. Since a player generating actions always stores these actions on the same nodes, the overlay will have established shortcut connections to these nodes. Storing a replica just takes the latency of one message on the link layer. We deﬁne lmax to be

an assumed maximum link layer message latency. All assumptions about whether nodes have completed phases of the update process are based on this latency. Most of the time, this assumption will hold. If it does not and a message is late, a node will report wrong data. Replication corrects for this and ensures the updating works correctly with a high probability.

Figure 5.7 shows the timing of the update process based on the assumed l_max. To calculate the state at time t_i+1, a node needs all actions of players generated until this time. As l_max is the maximum duration of store operations, all actions up to time t_i+1 should be stored at ta_i+1 = t_i+1 + l_max. At ta_i+1, nodes retrieve the actions of players in their update area between time t_i and t_i+1. We assume 2∗ l_max as bound for the duration of the retrieve operation as it consists of one retrieve message to the storing node and one result message back to the retrieving node. Consequently, 2∗ l_max is the timeout for action retrieves. Therefore, at time

t_i+1+3∗l_maxall nodes should have completed their retrieve operations for all actions up to t_i+ 1. Therefore, they can calculate the state in their maintenance area Ac at time t_i+1. They actually perform that calculation upon retrieving the last missing

5.3 World State Updating 117

player actions so the state updates at the diﬀerent nodes will happen somewhere between t_i+1+ l_max and t_i+1+ 3∗ l_max.

At ts

i+1 = ti+1+ 3∗ lmax, nodes can be sure all other nodes ﬁnished their update to

time t_i+1. Thus, nodes can start retrieving the state at t_i+1to prepare for calculating the state at t_i+2and the update cycle starts again. The cycle length is I = 5∗ l_max, yielding t_i+1− t_i = 5∗ l_max. Consequently, the aforementioned ts

i can be calculated

from ts

i = ti+ 3∗ lmax.

The operations State Retrieval and Action Retrieval shown in ﬁgure 5.7 run in- terchangeably at the given times. Retrieving actions starts as soon as actions are guaranteed to be stored as shown in row Stored Actions. When this retrieval is com- pleted on all nodes and the state is updated as shown in row Stored State, retrieving this updated state starts immediately. This gives the state retrieval operation a maximum time of 3∗ lmax to complete. Since nodes permanently request the same

area, they will have shortcut connections to the multicast root nodes of the area in the diﬀerent zones after the ﬁrst retrieve operation.

In total, the areacast probably needs two to three messages to reach all nodes in the area since area sizes in the destination zone should be similar to the area sizes in the source zone. The result of the retrieve will be returned in one hop using a shortcut. Therefore, 3∗ lmax seems to be sensible timeout for area retrieve operations as three

messages in a row should rarely take the maximum latency. If timing bounds get too tight and a lot of retrieve operations fail to obtain the necessary replica votes,

lmax can always be adjusted accordingly. This also increases the cycle length of the

update process.

As shown in ﬁgure 5.7, nodes are only guaranteed to store the state of time t_ibetween time t_i + 3∗ l_max and time t_i+1+ l_max. When the ﬁrst node completed retrieving all player actions, it will update its state to time t_i+1. The earliest theoretically possible time for this is right after the start of requesting actions t_i+1+ l_max. After that, requesting an area might yield an inconsistent state containing objects that have already been updated and objects that have not been updated yet. Therefore, players trying to check whether they are seeing the correct state of the world can only do this during the state retrieval phase when the world state should be consistent.

Clock Skew and Drift Compensation

The clocks of the diﬀerent nodes are only loosely synchronized within the bounds obtainable by NTP. When a node reads the current time to program a timer to start

118 Chapter 5 A Virtual World Storage

the action or state retrieve at times ta

i or tsi, the timer will ﬁre around these times

on the local clock. However, this time will slightly deviate from the true global time according to NTP accuracy. It is not a severe problem if the retrieval process starts a little late as still the other nodes should have completed their update and store operations. However, the retrieval process should not start too early to give other nodes the chance to complete their operations within the speciﬁed bounds.

This can easily be compensated by adding the maximum deviation t_d to the true global time to the programmed time. This way a node will start operations t_d later on average and 2∗ t_d later in the worst case if it runs slower but was compensated for running faster. In this worst case, the node will itself have less time than the speciﬁed bounds to complete its operations. However, retrieve operations have a 2∗ l_max and 3∗ l_max bound which are less likely to be exceeded as multiple message need to be late. On the other hand, the actions store operation has got a l_max bound that could be violated more often as only one store message has to be late. Therefore, we opted to ensure retrieve operations never run too early.

If we also assumed a minimum latency for messages, this minimum latency might have been suﬃcient to compensate for operations starting too early. However, it would be harder to decide on a minimum latency as messages could be really fast on a local network. Actions are also stored on the local node meaning no communication is necessary for the local store. Consequently, we did not use a lower bound for message latency higher than the implicit zero bound.

In document Cheating Prevention in Peer-to-Peer-based Massively Multiuser Virtual Environments (Page 121-124)