• No results found

if the information is not present and the node is the root o f the tree.

REMUS Recovery Mechanisms

3. if the information is not present and the node is the root o f the tree.

A s was described in Section 3.2, the basic m echanism follow ed by the node unit consists o f

checking its database for the required information, either the information is present, in which

case the request is sent to the child node at which further information will be found or the

is only sent to a child node when this child node is listed in its parent's database as having the

required information. This can be used to detect loss o f information. If the request is received

from the parent node, the node is supposed to have the information, if it does not, it assum es an

error has occurred.

Referring again to Section 3.2, when either searching for or updating information, the packet

follow s its way up the network tree until a node containing the required information is found at

which point the packets starts descending the required data path. If a packet is to be sent back to

the node it cam e from, it means inconsistencies are present in the database and hence the node

assum es an error has occurred. This test is restricted to location change requests because w hile for

data updates this test is essential, since if a packet is sent back through the data path it has just

built it w ill delete it, for data searches it is a redundant test. This is so because if a packet is sent

back to the ch ild node it has been through, this child node w ill signal the error by using the first

test above, the detection is sim ply delayed, hence there is no reason to add to the processing load

o f the system by introducing an extra test. If the node is the root o f the tree, it is not supposed to

have all the information in the network listed in its database. For exam ple, in case two networks are joined together (this m echanism is explained in detail in Section 4.3.3), the new root builds

its database slow ly according to demand, if a piece o f information is never required across the two

newly joined networks then the new root does not need to keep the information. However, when

the information is first required, the root adds it to its database, and in order to do so it uses the general m echanism for detection and recovery o f lost information. This is so because the root

cannot distinguish between lost information and new information being added to its database as it

does not have a parent node and it cannot, therefore, perform the first test listed above. We want

sim ple general m echanism s for the detection and recovery o f lost information so that all cases are

covered and no special solutions are required.

4.1.2 R ecovery M echanism

The inconsistency in database information can be found w hile the node is processing either a

location change request or a call request. The procedure follow ed by the node is different in each

o f these two cases. We first discuss the recovery procedure for location change requests.

4.1.2.1 R eco v ery P ro ced u re In itiated by L ocation C h an ge R eq uests

In case the original transaction is a location change request, the recovery procedure is the same

for botli local exchange and higher level nodes. This is because if the error is detected w hile

processing a location change request, it means a new registration has already been made and

there is no need to recover the lost information, hence its nature is not important. It is necessary,

though, to clear the inconsistencies in the network and to re-build the new data path, this

11 12| in wliich th e o rig in al packet has a flag a d d e d to it to identify itself as a flood fill packet a n d

is re p licated a n d sent to all the node's o u tg o i n g links. In th e follow ing nodes, the flood fill packet

IS re p licated a n d sent to all th eir o u tg o i n g links, ex cep t th e o n e it ea in e from T h is (lag g ed paeket

dele tes all th e in f o r m a tio n related to th e c o r r u p te d piece o f d a ta fro m the nodes it goes th r o u g h

T h is p r o c e d u r e is follo wed until th e local e x c h a n g e w h e r e th e o rig in al location c h a n g e reques t

had b een is sued is found. In o r d e r to be able to identify th e o ri g i n a t in g local e x c h a n g e {i.e. the

one c o n ta i n i n g th e most u p -to -d ate i n f o rm a t io n ) , a u n iq u e - s ta m p sc h e m e w as introduced.

missiiig, |n ece o l 'd a u

(a)

incon.sislency ( ,Q IS detected V —

A u se r A mcrvcs from cell 3 to cell 5

(31

I'igure 4.1 : E xam ple o f d ata loss detection and recoverx : (a) original data path is corrupted, node 10 has lost info n n atio n about user .A. (b) data inconsistency is detected w hen a location change request is issued, (c) Hood till m echanism is then initiated to

clear netw ork datahæse. (d) finally, location change request is re-issued by local exchange 5 to re-hm id correct data path.

As part o f th e u n iq u e - s ta m p s c h e m e , th e d a ta b a s e e n tr ie s at th e local e x c h a n g e s a re s t a m p e d w ith

a u n iq u e n u m b e r at th e ti m e they a r e cre ated. T h is u n iq u e n u m b e r is g e n e ra t e d by th e local

e x c h a n g e s by c o n c a t e n a ti n g th e ir u n iq u e id e n tif ie r to th e c u rr e n t \ a l u e o f an in tern al counter.

T h is p ro ces s g e n e r a t e s u n iq u e n u m b e r s th a t c a n be used to s ta m p the d a ta b a s e e n tries

u n a m b i g u o u s ly . T h e lo cation c h a n g e re quest c a rr ie s th is u n i q u e n u m b e r on its way up a n d do w n

th e n e tw o rk tree. T h e s ta m p is on ly re q u ire d if a n e r r o r is detec ted in w hich case th e s t a m p in the

lo catio n c h a n g e re quest (f lag g ed as a flood fill packet) is c o m p a r e d ag ain s t the s ta m p in th e local

ex c h a n g e 's data. If th e two s ta m p s ag ree, it m e a n s th e locatio n c h a n g e reques t w a s o r i g in a t e d at

th at local e x c h a n g e , th er efo re it c o n ta in s th e correct, m ost up -to -d a te piece o f data. If th e two

sta m p s do not a g ree, th e local e x c h a n g e has th e in f o r m a tio n dele ted from its database. N ote that

th e stam p in th e locatio n c h a n g e re quest a n d th e one in th e local e x c h a n g e 's d a ta b a se (that m ighi

need to be compared against one another) have been generated by the sam e source. Hence there is

no need to keep synchronization am ong node counters or such.

O nce the local exchange containing the correct piece o f information is found, the location change

request is re-issued rebuilding the database path up to the root node. This flood fill m echanism in

conjunction w ith the unique-stamp schem e allow s all spurious database paths to be cleared and a

new up-to-date one to be re-built. Figure 4.1 exem plifies the procedure discussed above.

The recoveiy m echanism described above and exem plified in figure 4.1 assumes that all the

database information related to the corrupted piece o f data is deleted from the network. Ideally,

this should be the case because the node that detects the inconsistency cannot evaluate the extent

o f the damage. However, this can incur a heavy sign alling load as flood fill packets are made to

propagate throughout the network, com prom ising the system ’s scalability {i.e. one flood fill

packet is generated for each node in the system, hence the number o f flood fill packets created per

recovery m echanism is proportional to the size o f the network). Section 4.4 discusses possible

m odifications to the current protocol that could restrict the depth o f the spread o f flood fill

packets, m aking the recovery m echanism scaleable.

4 .1 .2 .2 R eco v ery P roced u re In itiated by Call R eq uests

In case the original transaction is a call request, a sim ilar procedure as for the location change requests is follow ed but without the unique stamp. The unique-stamp schem e is used when

information is introduced or modified, and hence is particular to location change requests. The

node that detects the inconsistency initiates a flood fill m echanism in which the original call

request is flagged as a flood fill packet and replicated at each node deleting on its way all the

information related to the corrupted piece o f data. Once any spurious data have been removed

from the system , a new data path needs to be built and the call completed. There are three

alternatives for the database restoration and call com pletion procedures: