Explanations for Decision-Theoretic Systems

3.2 Literature Review

3.2.2 Explanations for Decision-Theoretic Systems

While there has been a lot of work on explanations for intelligent systems, such as expert and rule-based systems, there has not been much work for probabilistic and decision- theoretic systems. The main reason behind this discrepancy is the difference in processes through which they arrive at their conclusions. For probabilistic and decision-theoretic systems, there are well-known axioms of probability that are applied to perform inference and theorems from utility theory that are used to compute a policy. Since these systems are based on a principled approach, experts do not need to examine the reasoning trace to determine if the inference or policy computation process is correct. The trace would essentially refer to concepts such as Bayes’ theorem, or the principle of maximum expected utility, or other axioms of probability and utility theory. These techniques are well-known, and if it is assumed that they have been coded correctly, then there is no doubt that they will yield the correct result. This is in clear contrast with expert and rule-based systems. The experts need to examine what rules have been triggered as a result of the current state, and whether their sequence of instantiation and then execution is correct. This explains the large number of projects to generate explanations for expert and rule-based systems. On the other hand, for probabilistic and decision-theoretic systems, the requirement is to highlight portions of the input that lead to a particular result rather than to explain the principles behind these techniques. In the past, decision-theoretic approaches have not been scalable for real-world problems, which explains the lack of literature on explanations for MDPs and POMDPs. Decision-theoretic systems are now being proposed for use for relatively larger sized real-world problems [14, 67, 73], which now motivates the need for explanations in them. The use of larger sized real-world problems has also led to the proposal of approximate techniques for inference and policy generation to deal with scalability concerns. If the technique is approximate, then in such a case, it is again more important to provide an explanation to convince the expert that the approximation has not resulted in an incorrect solution.

The explanations of the previous section provide a good starting point for explaining MDPs, but do not serve the purpose completely. None of the systems discussed are stochastic, like MDPs. In most cases, the policy is already known to the designer, unlike MDPs in

which the policy is numerically computed. Also, none of them need to focus on a sequence of inter-related decisions, which is more complicated than explaining a single isolated decision. The complex numerical computations involved in finding optimal policies also make it difficult for users to gain an insight into the decision-making process. Druzdzel [30] rejected the notion that explanations for probabilistic systems cannot be generated due to these issues. In his work, causality was identified as a key component that could provide users with insight regarding the reasoning process. Since decision-theoretic systems (such as MDPs) are normative and not descriptive, it is even more important to provide users with explanations.

It may be noted here that a key issue in explanation of stochastic systems is the presentation of probability in different forms, such as numerical, verbal or graphical. An issue with using verbal explanations is that different people have different interpretations for terms such as “possibly”, “probably”, “likely” or “usually” [29].

Explanations in Bayesian Networks

Lacave et al. [56] provide a survey of different techniques in this area, and define three different types of explanations that are generated for Bayesian networks. The first type is related to explanation of evidence, in which the system explains what values of observed variables led to a certain conclusion. This technique is known as abduction. The second type is related to the explanation of the model in which the static knowledge, encoded in the Bayesian network, is presented or displayed to the user. The third technique refers to the explanation of reasoning in which the user is told how the inference process unfolds. This can be done by showing the reasoning process that led to certain results (possibly including intermediate results), or by showing the reasoning process that led to rejection of a hypothesis (possibly including intermediate results), or by providing knowledge to the user about hypothetical situations in which the result would have changed, if certain variables in the current state were different.

Chajewska and Helpburn [18] presented an approach for explanations in probabilistic systems by representing causality using Bayesian Networks and exploiting different links. The approach presented in this chapter is similar as it also uses templates to generate explanations and analyze the effects of the optimal action. However, the analysis is not restricted to a single relevant variable and the the long-term effects of the optimal action (beyond one time step) are also considered.

Explanations in Influence Diagrams

Lacave et al. [57, 58] present different approaches to explain graphical models, including Bayesian networks and influence diagrams. Their explanations are geared to users with a background in decision analysis, and they present utilities of different actions graphically and numerically. This work can be used to assist experts, and the authors mention that they have used it to construct and debug models to help medical doctors in diagnosis and decision-making. Similar techniques may be used to debug and validate the reward function for an MDP, but not the transition function. Furthermore, there is still a need to consider the effect of sequential decision-making.

Explanations in MDPs

There has been very little research on explanation of policies generated using MDPs with the exception of two other streams of work, that of Elizalde et al. [33, 32, 34] which was formulated at the same time as the work in this chapter and the work by Dodson et al., [25, 26] that was published subsequent to the publication of the work in this chapter.

In the work of Elizalde et al., [33, 32, 34] an explanation comprises three components: an optimal action, a relevant variable, and explaining why the optimal action is the best in terms of the relevant variable. They identify the relevant variable by determining which variable affects the utility function the most. Two heuristics are provided to determine a relevant variable. In the first method, they keep the rest of the state fixed and only change the values of one variable under consideration. By doing this, they measure the difference in the maximum and minimum values of the utilities of the states which are similar for all other variables except for the variable being considered. This process is repeated for all variables, and the variable with the largest difference between the maximum and minimum value is then considered relevant. In the second heuristic, they examine the optimal action for different states, such that only the value of the variable under consideration is changed and other values are kept fixed. They consider a variable more relevant if the optimal policy changes more frequently by changing the value of that variable, while keeping the values of other variables fixed. Such explanations belong to the category of reconstructive explanations since they do not mirror the reasoning process. They provide users intuition regarding the optimal action, by discussing the relevant variable and its possible impact but may not be useful for debugging purposes. It is also conceivable that multiple relevant variables may need to be combined to construct a more meaningful and intuitive explanation. It will be interesting to explore the long-term effects of the optimal action, rather than only focusing on the change of utility while identifying a relevant variable.

Subsequent to the publication of the work in this chapter [52], Dodson et al., [25, 26] have also described an approach based on natural language argumentation for explanations of MDP policies. Coincidentally, they also use the domain of academic course advising [41] similar to the evaluation domain used in this chapter. The technique described by Dodson et al., [25, 26] is more focused in helping end-users, who may not be very knowledgeable about the concept of an MDP or comfortable with the use of probabilities, to understand the explanations. Thus, the goal is not essentially to provide the minimum information required to prove the optimality of the policy but to convince an average user to trust the policy. In line with this goal, they also focus on presenting the explanations using a natural language interface. They also conduct a user study to compare their explanations with those presented in this thesis and demonstrate that users are more accepting of explanations provided in their natural language interface. The focus of the work in this chapter is to help experts understand the model and determine if any corrections are necessary to the transition function.

McGuinness et al. [64] identify several templates to present explanations in task pro- cessing systems based on predefined workflows. The approach in this chapter also uses templates, but predefined workflows cannot be used due to the probabilistic nature of MDPs.

In document Policy Explanation and Model Refinement in Decision-Theoretic Planning (Page 41-44)