Algorithms for Learning - A Model for Learning

3.3 Surveillance and Learning, a Generic Model

3.3.2 A Model for Learning

3.3.2.3 Algorithms for Learning

During the running of the surveillance system, the operator is able to send feedbacks to the multi-agent system. These feedbacks are used to signify a missing alert or an erroneous one. When an entity-agent receives a feedback, it performs a tuning according to the algorithm 3.1.

Algorithm 3.1:Entity-agent tuning algorithm.

foreach event-agent used by the considered entity-agent do Call eventTuning()

The design of the feedback is such that it concerns an entity at a given time. Thus, the entity-agent is able to perceive the feedback and to relate it to the current situation. As the situation is a combination of several events, the entity-agent then broadcast the feedback to the concerned event-agents. These agents have then to perform a tuning following the algorithm 3.2.

After the tuning of the event-agents, the entity-agent checks if there is a difference between the current Entity Behaviour Value (EBV) –computed after the tuning– and the

previous one –computed before the tuning. If there is no difference, the agent assumes that no tuning has been performed and asked for the tuning of the less constraint event-agent (see algorithm 3.4).

This is one aspect of the AMAS Theory. Indeed, each agent has the objective to reach a satisfaction that helps the system, without degrading its own satisfaction. Therefore, the agents are able to cooperate and identify the less constraint between themselves. Here, the cooperation allow to improve the whole system without degrading too much the agents and their beliefs.

Forcing the tuning of at least one event-agent allows the system to react each time there is a feedback. This has two objectives: first, it prevents the agents to do nothing despite the feedbacks and second, it if the agents are not tuning their value, the operator might find it useless to send feedbacks to a system that does not take it into account.

Algorithm 3.2:Event-agent tuning algorithm.

foreach parameter-agent of the considered event-agent do Call parameterTuning()

end

Similarly to the entity-agent, when an event-agent receive a feedback, it broadcasts it to its parameter-agents, which will use the algorithm 3.3. When receiving a feedback, the parameter-agent computes its own tuningEvolution (see 3.3.2.2). If the new value of tuningEvolution is compliant with the feedback, the parameter-agent performs a tuning of its value.

The compliance of tuningEvolution with the feedback is a simple check of the signs of the two variables. If it is the same, it means that the agent agrees to tune its value in the direction asked by the feedback. If different, it means that the agent prefers to tune its parameter in the reverse direction than the feedback, in which case, the agent does not perform the tuning in order to not disturb the system. This mechanism follows another aspect of the AMAS Theory. Indeed, the parameter-agents are able to use the tuningEvolution variable to check if they are in conflict with other agents: themselves in past situations. In case of a detected conflict, they decide to not perform the tuning in order to keep the satisfaction they have.

More than with other agents, they are cooperative with themselves.

Algorithm 3.3:Parameter-agent tuning algorithm.

tuningEvolution←computeTuningEvolution() if compliance(tuningEvolution, operatorFeedback) then

Call pTuning() end

In the case of there is no tuning performed by none of all the parameter-agents involved in a situation, the entity-agent asks the concerned event-agents for a forced tuning. This tuning is made according algorithm 3.4. When none of the agents decide to tune themselves, it is because they have decided that the required tuning is not compliant with their

knowledge, meaning their usual actions. Thus, it is irrelevant to ask a forced tuning of all the agents, because it would imply that all the agents must go against their belief, rendering this belief useless, and so for the decision process.

Algorithm 3.4:Entity-agent force tuning algorithm.

foreach event-agent of the considered ship-agent do Call computeEventCriticalLevel()

Instead, each event-agent computes it critical level. The critical level of an agent is a numerical value representing the constraints on an agent. The higher is this value, the more constraint is the agent. The function 3.9 is used to compute this value.

a_CL = |Se|avec Sal⁰ensemble des situations tel que a∈S (3.9) This function is a simple count of all the situation where the event a is involved. Indeed, an event-agent used by several entity-agents to compute their EBV –i.e. is used in several situations– is more constraint than an agent used by only one entity-agent. Once the less constraint agents are identified, they are designated to perform a tuning no matter their decision. In order to achieve it, they force the tuning of at least one of their parameter according to the algorithm 3.5.

This algorithm is similar to the algorithm used to force the tuning of an event: the parameters of the concerned agent compute their critical level and the less constraint is designated to tune its value no matter its tuningEvolution value. The critical level of a parameter has the same nature as the critical level of an event: a numerical value expressing the constraint on the agent. However, its computation is different, as illustrated by the function 3.10.

p_CL = gapTo0²−gapToBound², {gapTo0, gapToBound} ∈ [0, 1] (3.10) In this function, gapTo0 represents the distance between tuningEvolution and 0. The more this value is small, the more the agent is close to change its tuning direction, either because

it has received a lot of feedbacks asking to do so, or because it has not yet perform enough tuning in a given direction. The variable gapToBound is indicates the distance between the current value of the parameter and the boundary of the search space, according to the feedback –the inferior boundary if asked to decrease the value and vice versa. The more the value of the parameter is close to the boundary, the more the agent is constraint in its tuning because it reduces the residual search space. These two values are normalized between 0 and 1.

The result of this function is a numerical value between −1 and 1 expressing the constraint on the concerned parameter-agents. The smaller is this value, the lesser the agent is constraint. Once the less constraint parameter-agents are identified, the tuning of the parameter value is performed by the related Adaptive Value Tracker and according to the feedback.

The algorithms presented in this section are based on a reinforcement learning approach in order to tune the weight, the importance, of the events happening on the entities in a monitored area. The operators involved in the surveillance process are able to send a feedback on a situation on an entity. Following this feedback, the concerned agents modify their value in order to comply with it. This is a learning process that aims to reaching accurate values, meaning values that do not provoke feedbacks from the operators.

In document Self-adaptive multi-agent systems for aided decision-making : an application to maritime surveillance (Page 81-84)