Requirements - Reinforcement learning for robot navigation in constrained environments

This section reports the requirements for the project. In particular, five main topics are cov- ered: in section 3.3.1, Reinforcement Learning algorithm requirements are described, paying attention to convergence and tuning issues. Moreover, the setup requirements are shown in section 3.3.2, while section 3.3.3 deal with tests and simulation requirements. In section 3.3.4, the requirements concerning the documentation are presented. Finally, section 3.3.5 illustrates the non-functional requirements that have not been specified beforehand.

30 Reinforcement Learning for Robot Navigation in Constrained Environments

3.3.1 RL requirements

1. Employ value-function based RL algorithms

Value-function-based RL algorithms will be adopted in this project, since they have been proven to be efficient in robotic navigation and manipulation applications (9)-(17). Fur- thermore, their implementation is more straightforward and completely model-free, so no previous knowledge of the environment is required. In this perspective, the tested value-function based RL algorithms have to guarantee convergence to a sub-optimal policy in at maximum 8 hours, so that they can be fully tested in one working day.

2. Handle large state-space efficiently

If the state-space is discretized, a proper discretization level should be found out so that the manipulator will still be able to accurately reach specific goal location. If the discretization level is too low, the considered state-space representation of the environment would be too rough and would not allow the end-effector to reach precise locations, while high discretization level would correspond to not-allocable Q-table. Thus, the discretization level should be chosen in such a way that the end-effector can reach states in the neighborhood of the goal (maximum 5 millimeters away).

On the contrary, if continuous state-spaces are adopted, the Q-table will be substituted by a neural network (see section 2.1.8), which should be appropriately tuned and trained to manage the whole state-space. In particular, the training period of the neural network should not take more than 0.5 seconds so that the manipulator motion remains fluent. 3. Find an optimal trade-off between exploration and exploitation

A trade-off between exploration and exploitation phases should be figured out appropriately tuning≤, to encourage the agent to learn the environment in a smart and goal- oriented way. In particular, the agent should not waste time in exploring areas far away from the goal, but, at the same time, it should be able to learn a suboptimal policy in less than the available 8 working hours.

4. Make the algorithm as much as possible environment independent

The algorithm should be able to operate without requiring too many information on the environment in which the robot is placed. If this requirement is satisfied, the algorithm becomes flexible and easily adaptable to new conditions and constraints. To test if this requirement is satisfied, the acquired knowledge (i.e. Q) can be utilized in a transfer learning perspective, to learn similar environments initializing theQin the same way. In particular, the transfer learning approach has to be at least 50% faster than the standard learning algorithm.

5. Improve convergence rate appropriately tuning RL parameters

As mentioned beforehand , RL parameters actually affect the convergence of the algorithms. Thus, their tuning has to be justified such that convergence to a (sub)optimal policy is always guaranteed.

6. Smart collisions management

A proper interaction between the robot arm and the obstacles should be figured out to speed up the learning phase. E.g. reset the robot to its initial pose when a obstacle is hit. Furthermore, the environment could be over-constrained to assess the efficiency of the algorithm. This requirement is satisfied in the moment in which the number of collisions is minimized without affecting the learning rate.

CHAPTER 3. ANALYSIS 31

3.3.2 Setup requirements

1. The setup should be manufactured with RAM facilities.

The setup should be designed in such a way that RaM Group laser cut or 3D printing prototyping technology could be used to manufacture it.

2. The necessary components should be low-cost and commercial.

The components required for both the manipulator and the camera should be cheap and easy to find, considering a maximum available budget of 1000".

3. The communication between different hardware components have to be as fast as possible The communication between the different systems (camera, motors, control architecture with RL algorithm) should be fast and reliable, so that signals can be exchanged as much as possible real-time to avoid implementing complex synchronization procedures. To satisfy this requirement it is advisable to make use of already tested open-source libraries. 4. The camera has to localize the markers efficiently

As already stated, markers will be placed in the points of interest to realize visual-guided localization and tracking. Thus, markers should be selected such that they are easy to be detected and tracked real-time. The detection algorithm should not take more than 0.5 seconds to detect all the markers present in the scene.

3.3.3 Tests requirements

1. Simulation is the first-step for valuable tests.

Before performing tests on the real setup, evaluate the code in simulation is important to become sure of its outcome. The simulation has to be as much as possible representative of the real conditions under which the robot will operate.

2. Employ virtual obstacles to bypass setup damages.

To avoid actual damages to the setup during possible collisions, virtual obstacles has to be employed to make the test effective and safe at the same time.

3. Prioritize tests on more performing algorithms.

Since RL algorithms require some time to converge, it is necessary to prioritize tests on more efficient algorithms, such that the full work-ability of such algorithms can be ex- plored.

3.3.4 Documentation requirements

1. The documentation has to be up-to-date.

The project documentation has to be frequently updated, such that all the meaningful developments and results are always underlined and real-time reviewed. In this way, no information loss should occur.

2. Code documentation has to be well-structured.

In order to make the code user-friendly, the scripts should be clearly commented and organized to improve readability and maintenance.

3.3.5 Non-functional requirements 1. The code should be user-friendly.

The code should be quickly readable and effectively understandable from both a user and programmer point of view. It can also be provided of an intuitive Graphic User Interface (GUI) to simplify the interaction with the code itself.

32 Reinforcement Learning for Robot Navigation in Constrained Environments

In document Reinforcement learning for robot navigation in constrained environments (Page 37-40)