Communication between controller and the testbed

In order to communicate the controller with the network it was required not to be the controller who asks to each node its commands, as in a push request system, but to reverse it and use a pull request system, where the nodes ask for commands to execute. Thanks to this pull procedure and the scalable Flask RESTful features, the controller can serve the petition with few resource requirements, as can be seen in The WiBed controller section.

3.3.1 The management network

The management network is an 802.11 ad-hoc wifi network used to access remotely to the testbed and send them commands and monitoring the nodes.

In addition, this network is provided by one of the node’s radios using the Batman-adv routing protocol (see section 1.4.7). The resulting meshed net-work uses the same collision domain and is automatically configured using IPv6 addresses. Is thanks to that auto-configuration that the management and administration of the testbed is easily possible, even when performing an experiment.

Although this management network does not require any wired link, the network will have at least one wired node acting as the gateway of the testbed. Gateways are the nodes in charge of bringing connectivity through the Internet and, mainly, to connect the testbed with the controller.

One of the hardest problems to solve is the disconnection or lack of con-nectivity in the testbed due to a bad configuration or the nature of the ex-periment. Hence, security actuations are needed to avoid isolation or node’s misconfiguration. There are three main scenarios where the recovery system should act:

1. The node is performing an experiment and the configuration of the management network has been changed.

In this case, the node will not be able to connect to the Internet or the controller. Current solution is that, giving a predefined time interval, the node will unmount the overlay, finish the experiment and going back to the default state.

2. The node is performing an experiment and the controller is not re-sponding the pull requests.

In this case, the node will not be able to receive any of the researcher’s requests, even having Internet connection. Current solution is to wait Npull requests not correctly processed and then unmount the overlay, finish the experiment and go back to the default state.

3. The node is working in its default state and the controller is not re-sponding its pull requests or it has no connection to the Internet.

In this case, the node is not performing an experiment but, for some reason, its configuration has been modified or the node has lost Internet connectivity. Current solution is to wait a predefined interval of time and go back to the default state. Consequently, if the node is isolated a long time, it will be going back to default state continuously.

It is important to keep in mind that in some experimental scenarios that may have special disconnection needs, that recovery system must be

reconfigured willing to allow bigger time intervals or pull request or, in some cases, to totally stop it.

3.3.2 Controller acknowledgement system

One of the most important troubles that the WiBed team has faced is that, when working in a pull-based system, the controller could receive a repeated request due to connectivity issues or because the latency between the node and the server. Keeping that in mind, the solution proposed was to use an acknowledgement system between node and controller in commands, exper-iments and its results. The experiment part just adds information regarding the experiment being performed by the node so, focusing in the command’s part, which is the important one, the controller has a auto-increased list of executed commands, and which nodes were involved, willing to save an order of execution and also not to repeat commands already finished.

The information that is currently keeped in each node is:

• exp_id: This variable shows the last or current experiment in which the node was involved.

• commandsAck: This variable shows the last successful command received by the controller and executed in the node.

• resultsAck: This variable shows the last successful result sent to the controller by the node.

The information kept for each node in the controller is:

• commandId: This variable shows the ID of the last sent command to the node.

• executionId: This variable shows the ID of the last executed com-mand in the node that its results have been correctly received.

To summarise, the point of the acknowledgement system relays in the synchronitzation of the commandsAck-commandId and resultsAck-executionId.

The procedure of executing commands follows this sequence:

1. A researcher introduces a command in the controller.

2. The controller checks the last command’s ID (commandId) and sets the new command as the number ID+1.

3. The node executes a pull request: It sends its status, the commandAck of the last successfully executed command and a JSON formmatted list with the results pending to be sent to the controller.

4. The controller receives all the information regarding node’s state and responds with the new command to be executed and the executionId of the last succesfully received result.

5. The node receives the new command ID and, if is greater than its current commandAck, it executes the command, stores the results and sets the commandAck to new command’s ID.

6. In the pull request after the command has been executed and the re-sults have been saved, the node sends the information regarding the status, commandAck (updated) and the command’s results (its ID and the outputs).

7. The controller receives the request, synchronises its variables with the commandAck and its results and sends back to the node the updated executionId, which is the successfully completed new command’s ID.

In document UPC CN-A Testbed mesh network deployment, monitoring and validation (Page 33-36)