Gossip-based routing for CMS coordination

**Mfc* Access**

5.4 Pub/Sub information dissemination model and strategies

5.4.2 Gossip-based routing for CMS coordination

The decentralized characteristics of the CMS require the simplicity and topology in-dependences features, which can be meet by gossip-based routing. Hence, I propose a mechanism to allow each broker to dynamically adapting the dissemination of control events. Retaking a CMS workflow, when an Orchestrator in the name of AL publishes a control event through the Publish/Publish network, this information is routed from the serving broker to other peer brokers in the network. Therefore, depending on the LG that connect the different ALS the gossip-based routing is tune-up.

The routing relies on a gossip-based algorithm I called Control Event Dissemination Algorithm composed of two main threads active (4) and passive (12), which periodically run inside the broker. The round dissemination time (Ts) determines the activation in-terval of both threads. The passive thread waits for an incoming connection from a peer-broker (13), receives the control event values of its peer (Sc.Er) and saves the mes-sage in a queue, along with a timestamp. The active thread periodically routes (9) with its peer-broker (Nb), the control events (Sc. Ec.) that have been saved in the queue, but only in the interval of Ts, remaining events stay in the queue. Brokers do not route events to the same brokers that originally pushed the event. Both active and passive thread make use of the update function which selects the accurate strategy the broker has to perform, in order to optimize the event dissemination. Next, the peer-broker sends the Sc.Ec. following an routing strategy (17) that depends on the Ts got from the strategyTime (18) procedure. The routing strategy depends on the internal bindings and the LG the broker is supporting, so latter on I develop this matter. Concerning the neighbor discovery, I assume brokers implement some of the existing mechanisms

(Pallickara, Gadgil, & Fox, 2005) to discover peer brokers. I also assume that brokers have been clock synchronized using, as an example the network time protocol (NTP).

Input: activation interval (Ts) of routing Output: routing of control events Sc.Ec 1: initialization

2: initTimer(Ts)

3: init active thread & passive thread 4: active thread

5: while true do

6: sleep(Ts)

7: Sc.Ec <- getCurrentControlEventsQ 8: Nb <-getCurrentNeighbord()

9: recover Sc.Ec from queue and route to Nb 10 : update (Sc.Ec, Sc.Er. Ts) 16 : function update(Sc.Fc, Sc.Er, Ts)

17 : Sc.Ec <- strategy (sType, Sc.EC, Sc.Er, Ts) 18 : Ts <- strategyTime (sType, Sc.EC, Sc.Er, Ts)

Algorithm 6. Control Event Dissemination algorithm

The routing strategies to tune-up the gossip communication driven by the CEDA are the following:

Brokers handle LG as composite bindings of predecessors and successors subscrip-tions representing tasks' interests. In the CMS initialization, the Coordination sends to each of the brokers the number of tasks' ALS (a) that participates in the same CMS in-stance. Having a broker population (n), the highly distributed scenario is set as one broker per each limit activity n = a. Based on the work of ( Eugster, Guerraoui, Kermar-rec, & Massoulie, 2004), the expected fraction of peer-brokers (Yr) that already knows the Sc.Ec after a round can be expressed as: Yr~l/(l+n.eA(-a.r)). Therefore, the strategy is based on using this math formula to pre-calculate the evolution of Yr in the different

rounds, and increase or decrease Ts according to the result. However, if the rate of con-trol events is high, pre-calculating the evolution for each concon-trol event can affect the performance of the broker and the network (as they are set to be executed in mobile devices), so I propose to do this pre-calculation over a fixed period of time taken from the RTT delay with the most distant peer-broker. This strategy is applicable to Se-quence and Parallel split LG.

In the case of synchronization LG, brokers are able to monitor the status of BIP using and check whether they are still active or not by receiving information from the Pub-lish/Subscribe layer. In the case a mobile device executing a BIP intentionally or unin-tentionally disconnects, the broker modifies the conditions to trigger an ev' as the orig-inally conditions will never be satisfied. In the coordination model for the synchroniza-tion gate I proposed, predecessor and successor bindings are clustered in the same broker; however, in the case of having predecessor and distributed bindings across different brokers, a broker can gossip an event (msp.broker.gisai.dit.upm.es, <payload>

including in the payload the new list of BIP for this LG. This can be done by using the control channel defined in Section 4.2.2, by attaching in the payload the new list of predecessor binding for this specific gate; so once this event arrives to the broker it can on-the-fly update the runtime status of the subscription internal binding.

When a broker receives a control event, it adds a timestamp, and compares the RTT of the most distant peer with the delay resulting from the difference between the new-est and oldnew-est timnew-estamp of the received event (based on the topic that identifies the event). If a broker calculates that the resulting delay is higher than the RTT the broker starts the procedure that implements a new algorithm I proposed called: Round Dissem-ination Time (Ts) Calculation Algorithm (TCA), for the CDA thread. The motivation of the TCA algorithm is because the Ts value must be accurately tune-up in order to maintain the performance of the gossip-based routing. If Ts is too high, the routing can lead to unacceptable delays in the CMS execution, on the other hand, if the Ts value is too small, the gossip-based routing can overload brokers' resources and affect the perfor-mance of the network by introducing a large amount of unnecessary traffic.

To maintain the performance of the gossip routing, only the brokers that received events matching their internal bindings are candidates to executing TCA, in addition one TCA thread is expected per Task; in other cases the broker maintains a fixed value of the Ts (TsMax) that depends on the specific application scenario of the CMS (e.g.

tolerant or not tolerant to long-delays, very unreliable network and unstable environ-ments). Step 2 recovers the number of peer brokers which were contacted in the last gossip round (OBC), this value is different from the total population of peer-brokers (TPB).

Input: old Ts Output : updated Ts

1: procedure calculate time(Ts) 2: OBC <- getOldbrokers(events) 3: IF ( TPB/3 < OBC < TPB ) 4: Rv.old <- recover() 5: Timediff <- |RvE.old- RvE |

6: Timediff.P <- Timediff x 100 /Tsmax 7: if Timediff.P > Threshold.%

8: set Ts with Tsmin

9: else

10 add 10% of Tsmax to Ts 11 ELSE

12 set Ts with Tsmax 13 return Ts

Algorithm 7. Round Dissemination Time (Ts) Calculation algorithm

Step 3 recovers the actual values of Ts that the own algorithm instance has used for previous gossip-rounds for the RvE, in the case there are no previous rounds, it sets Ts with TsMax. In the first moments of the gossiping process, only a small amount of bro-kers are infective, so, according to (Kephart, 1994), the opportunity to select a suscepti-ble unit is high and reliasuscepti-ble. Based on this research result, steps 3 compares if the num-ber of OBC is between the range TPB/3 and TPB/3, and based on this the algorithm choose the Ts value. Step 4 recovers the amount of events exchange in the last round (RvE.old). Step 5 and 6 calculate the percentage of difference (Timediff) in the timestamp of the RvE.old and Rv.E where Tsmax is referenced as the 100%. Then, Step 7 verifies if Time.diff is larger than the Threshold %, which represents the maximum delay tolerable in the network that is independent of the Tsmax. Steps 8 and 10 set the Ts according to the result of Step 7. Step 12 assigns the maximum delay to the iteration as the chances of infecting peer-broker have been decrease. Finally, Ts is returned.

5.5 Experiments

In this section, I show the experiments that sustain some of the contributions pro-posed in this chapter.

The following research questions were created to design, plan and guide the exper-iments.

• Does the extension of the topic-based features in brokers lead to a better use of existing network resources in CMS executions? This research question is di-rectly related to the Sub-hypothesis 2 and answered in Section 5.5.1; so, I de-scribe the experimentation efforts to verify the network performance of the solu-tions for the correlation of events in a topic-based network based on MQTT.

• Does the usage of gossip-based routing for the service coordination improve its execution resilience in unstable wireless environments? This research ques-tion is directly related to the Sub-hypothesis 2 and answered in Secques-tion 5.5.2, so, I measure the performance the gossip based routing versus the simple

Pub-lish/Subscribe routing in a wireless network scenario, based on ns-3, with a high percentage of failing links.

• Does the adaptation of the gossip-based routing for the service coordination improve the completion time of the service? This research question is directly related to the Sub-hypothesis 2 and answered in Section 5.5.2, so, I evaluate the performance offered by tuning-up the gossip routing according to the state-ments described in Section 5.4.2.

In document Contribution to integration and coordination mechanisms for mobile distributed services in publish/subscribe networks (Page 105-109)