REMUS Recovery Mechanisms
3. if the information is not present and the node is the root o f the tree.
A s was described in Section 3.2, the basic m echanism follow ed by the node unit consists o f
checking its database for the required information, either the information is present, in which
case the request is sent to the child node at which further information will be found or the
is only sent to a child node when this child node is listed in its parent's database as having the
required information. This can be used to detect loss o f information. If the request is received
from the parent node, the node is supposed to have the information, if it does not, it assum es an
error has occurred.
Referring again to Section 3.2, when either searching for or updating information, the packet
follow s its way up the network tree until a node containing the required information is found at
which point the packets starts descending the required data path. If a packet is to be sent back to
the node it cam e from, it means inconsistencies are present in the database and hence the node
assum es an error has occurred. This test is restricted to location change requests because w hile for
data updates this test is essential, since if a packet is sent back through the data path it has just
built it w ill delete it, for data searches it is a redundant test. This is so because if a packet is sent
back to the ch ild node it has been through, this child node w ill signal the error by using the first
test above, the detection is sim ply delayed, hence there is no reason to add to the processing load
o f the system by introducing an extra test. If the node is the root o f the tree, it is not supposed to
have all the information in the network listed in its database. For exam ple, in case two networks are joined together (this m echanism is explained in detail in Section 4.3.3), the new root builds
its database slow ly according to demand, if a piece o f information is never required across the two
newly joined networks then the new root does not need to keep the information. However, when
the information is first required, the root adds it to its database, and in order to do so it uses the general m echanism for detection and recovery o f lost information. This is so because the root
cannot distinguish between lost information and new information being added to its database as it
does not have a parent node and it cannot, therefore, perform the first test listed above. We want
sim ple general m echanism s for the detection and recovery o f lost information so that all cases are
covered and no special solutions are required.
4.1.2 R ecovery M echanism
The inconsistency in database information can be found w hile the node is processing either a
location change request or a call request. The procedure follow ed by the node is different in each
o f these two cases. We first discuss the recovery procedure for location change requests.
4.1.2.1 R eco v ery P ro ced u re In itiated by L ocation C h an ge R eq uests
In case the original transaction is a location change request, the recovery procedure is the same
for botli local exchange and higher level nodes. This is because if the error is detected w hile
processing a location change request, it means a new registration has already been made and
there is no need to recover the lost information, hence its nature is not important. It is necessary,
though, to clear the inconsistencies in the network and to re-build the new data path, this
11 12| in wliich th e o rig in al packet has a flag a d d e d to it to identify itself as a flood fill packet a n d
is re p licated a n d sent to all the node's o u tg o i n g links. In th e follow ing nodes, the flood fill packet
IS re p licated a n d sent to all th eir o u tg o i n g links, ex cep t th e o n e it ea in e from T h is (lag g ed paeket
dele tes all th e in f o r m a tio n related to th e c o r r u p te d piece o f d a ta fro m the nodes it goes th r o u g h
T h is p r o c e d u r e is follo wed until th e local e x c h a n g e w h e r e th e o rig in al location c h a n g e reques t
had b een is sued is found. In o r d e r to be able to identify th e o ri g i n a t in g local e x c h a n g e {i.e. the
one c o n ta i n i n g th e most u p -to -d ate i n f o rm a t io n ) , a u n iq u e - s ta m p sc h e m e w as introduced.
missiiig, |n ece o l 'd a u
(a)
incon.sislency ( ,Q IS detected V —
A u se r A mcrvcs from cell 3 to cell 5
(31
I'igure 4.1 : E xam ple o f d ata loss detection and recoverx : (a) original data path is corrupted, node 10 has lost info n n atio n about user .A. (b) data inconsistency is detected w hen a location change request is issued, (c) Hood till m echanism is then initiated to
clear netw ork datahæse. (d) finally, location change request is re-issued by local exchange 5 to re-hm id correct data path.
As part o f th e u n iq u e - s ta m p s c h e m e , th e d a ta b a s e e n tr ie s at th e local e x c h a n g e s a re s t a m p e d w ith
a u n iq u e n u m b e r at th e ti m e they a r e cre ated. T h is u n iq u e n u m b e r is g e n e ra t e d by th e local
e x c h a n g e s by c o n c a t e n a ti n g th e ir u n iq u e id e n tif ie r to th e c u rr e n t \ a l u e o f an in tern al counter.
T h is p ro ces s g e n e r a t e s u n iq u e n u m b e r s th a t c a n be used to s ta m p the d a ta b a s e e n tries
u n a m b i g u o u s ly . T h e lo cation c h a n g e re quest c a rr ie s th is u n i q u e n u m b e r on its way up a n d do w n
th e n e tw o rk tree. T h e s ta m p is on ly re q u ire d if a n e r r o r is detec ted in w hich case th e s t a m p in the
lo catio n c h a n g e re quest (f lag g ed as a flood fill packet) is c o m p a r e d ag ain s t the s ta m p in th e local
ex c h a n g e 's data. If th e two s ta m p s ag ree, it m e a n s th e locatio n c h a n g e reques t w a s o r i g in a t e d at
th at local e x c h a n g e , th er efo re it c o n ta in s th e correct, m ost up -to -d a te piece o f data. If th e two
sta m p s do not a g ree, th e local e x c h a n g e has th e in f o r m a tio n dele ted from its database. N ote that
th e stam p in th e locatio n c h a n g e re quest a n d th e one in th e local e x c h a n g e 's d a ta b a se (that m ighi
need to be compared against one another) have been generated by the sam e source. Hence there is
no need to keep synchronization am ong node counters or such.
O nce the local exchange containing the correct piece o f information is found, the location change
request is re-issued rebuilding the database path up to the root node. This flood fill m echanism in
conjunction w ith the unique-stamp schem e allow s all spurious database paths to be cleared and a
new up-to-date one to be re-built. Figure 4.1 exem plifies the procedure discussed above.
The recoveiy m echanism described above and exem plified in figure 4.1 assumes that all the
database information related to the corrupted piece o f data is deleted from the network. Ideally,
this should be the case because the node that detects the inconsistency cannot evaluate the extent
o f the damage. However, this can incur a heavy sign alling load as flood fill packets are made to
propagate throughout the network, com prom ising the system ’s scalability {i.e. one flood fill
packet is generated for each node in the system, hence the number o f flood fill packets created per
recovery m echanism is proportional to the size o f the network). Section 4.4 discusses possible
m odifications to the current protocol that could restrict the depth o f the spread o f flood fill
packets, m aking the recovery m echanism scaleable.
4 .1 .2 .2 R eco v ery P roced u re In itiated by Call R eq uests
In case the original transaction is a call request, a sim ilar procedure as for the location change requests is follow ed but without the unique stamp. The unique-stamp schem e is used when
information is introduced or modified, and hence is particular to location change requests. The
node that detects the inconsistency initiates a flood fill m echanism in which the original call
request is flagged as a flood fill packet and replicated at each node deleting on its way all the
information related to the corrupted piece o f data. Once any spurious data have been removed
from the system , a new data path needs to be built and the call completed. There are three
alternatives for the database restoration and call com pletion procedures: