From Extensions to Heuristics - Argumentation Frameworks for Reinforcement Learning

3.2 Argumentation Frameworks for Reinforcement Learning

3.2.4 From Extensions to Heuristics

Above, we have introduced SCAF, VSCAF and AF−and proved some properties of the preferred and grounded extensions of AF−. Recall that our purpose of computing these extensions is to obtain the heuristics, namely the recommended action for each agent. Here we discuss how to obtain heuristics from the extensions.

When the grounded extension is used, there is one and only one grounded ex- tension for AF− (see Section 2.1.2). If the grounded extension is non-empty, each agentAgenti simply needs to find whether there exists any argument in the

grounded extension belonging toAgenti (for definition of ‘belong’, see Section

3.2.2): if there is an argument A belonging to Agenti, since Theorem 1 proves

that each agent can have at most one recommended action, we can just recom- mend the action supported byA, i.e. con(A) (see Section 3.2.2), to Agenti. The

above method also applies to cases when preferred extensions are used and there exists only one preferred extension for AF−.

When preferred extensions are used and there are more than one preferred extensions, we need to break ties between them and select only one preferred extension to give recommendations: as illustrated above in Examples 7 and 8, although the recommended actions given by the same preferred extension are ‘compatible’

with each other, i.e. the same action is not recommended to different agents and the same agent does not receive different recommended actions, actions recommended by different preferred extensions can be ‘conflicting’. Recall that when there are multiple preferred extensions for AF−, these preferred extensions are ‘equally good’ (Section 2.1.2). Based on this understanding, we randomly select a preferred extension and let all cooperative agents use this preferred extension to obtain its own recommended actions. To this end, we select an agent, called the

captain agent, to perform the random selection; after it selects a preferred exten-

sion, it tells other agents which arguments are contained in the selected preferred extension.

Note that when there are multiple preferred extensions, all agents must all use the same preferred extension; otherwise, more than one agent may be recommended the same action: for example, consider Example 8 in Section 3.2.3; if takerT2uses preferred extensionP1to generate heuristics whileT1uses extension

P2, both these takers will go to perform MarkKeeper(3), which is undesirable.

So in these cases, communication between agents are essential. However, some ‘tricks’ can be used to reduce the communication burden: for example, if the captain agent can afford computing the recommended action for each agent, then it can directly let other agents know their recommended actions, without communi- cating the whole extension; or when the captain agent cannot afford this, agents can use a universal predefined system to index all candidate arguments, so that the captain agent only needs to let other agents know the indices of the arguments in the selected preferred extension.

The method described above is summarised as a function getRecActFromExt, whose pseudo code is given in Algorithm 6. This function has two arguments: a setE containing all extensions of required type (preferred or grounded), and the agent indexi. The purpose of this function is to obtain the recommended action for Agenti. If the agent does not have any recommended action, this function returns

null.

Now we walk through this function. If there is only one extension inE (lines 2 to 8), the function checks every argument in this extension to see whether any argument belongs toAgenti: if there is one or more arguments that belongs to

Agenti, the function returns the action supported by these arguments (line 5, note

that these arguments are guaranteed to support the same action, proved in Theorem 1); otherwise, it returns null, indicating that there is no recommended action for this agent (line 8). If there are multiple extensions inE (lines 10 to 22), when

Agentiis not the captain agent, this function simply waits for the captain to send

the recommended action forAgenti, and then returns this action (line 11); oth-

erwise, the function builds a table to store all agents’ recommended actions (line 13), and updates this table by finding each agent’s recommended action (line 16). Finally, the function informs all other agents their recommended actions (line 18 to 20), and returnAgenti’s recommended action (line 21).

Algorithm 6 Function for obtaining recommended action from extensions. 1: function getRecActFromExt(ExtensionSetE, AgentIndex i)

2: if there is only one extensionS in E then 3: for argumentarg in S do

4: ifarg belongs to Agentithen

5: returncon(arg)

6: end if

7: end for 8: return null 9: else

10: ifAgentiis not the captain then

11: receive the recommended actiona from the captain agent, and return a

12: else

13: initialise table actTable, whose keys are all agents’ indices, entries are all null

14: randomly selects an extensionS from E, 15: for argumentarg in S do

16: find the ownerAgentjofarg, and have actTable(j) := con(arg)

17: end for

18: for all agents indicesj 6= i do

19: informAgentj its recommended action actTable(j)

20: end for

21: return actTable(i) 22: end if

23: end if

In document Argumentation accelerated reinforcement learning (Page 76-78)