• No results found

In this chapter I considered three types of queries that an agent acting in an uncertain se-quential decision-making problem can ask. For reward and transition queries, which query parameters of the agent’s uncertainty over MDP models directly, I showed that Bayes up-dates for query responses can be performed efficiently when the agent’s uncertainty is rep-resentable by independent Dirichlet distributions. Then I discussed several straightforward approximations that can be applied to reduce EVOI computations to repeated sequential op-timal planning computations, which in turn can be performed by standard opop-timal planning methods, producing the Expected Myopic Gain (EMG) algorithm for selecting transition and reward queries. I then presented empirical results that demonstrated EMG’s near-optimality when its assumptions were met, and that it continued to be effective even when the assumptions were violated.

Then I introduced action queries, and developed a new action query selection method, called Expected Myopic Gain-based Action Query Selection (EMG-AQS), which imple-ments EVOI-based action query selection by performing Bayes updates using Bayesian Inverse Reinforcement Learning, and takes advantage of the structure present when the agent has only reward uncertainty to simplify optimal planning computations. Even so, the optimal planning computations required by EMG-AQS are expensive and I presented an empirical comparison between EMG-AQS and an existing action query selection method, Active Sampling (AS), which has the computational advantage that it does not need to per-form expensive optimal planning computations to select a query. Although EMG-AQS is Bayes-optimal for a single query, I tested the effects of two conditions under which EMG-AQS is not Bayes-optimal: performance over a sequence of queries, and performance given a fixed amount of time allotted to select queries. Under the former condition, I found that in most cases EMG-AQS outperformed AS over a sequence of queries due to its superior dis-cretion between querying to improve short-term versus long-term reward collection. Under

the latter condition, I found that when little computation time was available, AS outper-formed EMG-AQS, but once enough time was available, EMG-AQS outperoutper-formed AS. In addition, I devised the Hybrid algorithm, which performed better than either method alone in settings with limited computation time.

CHAPTER 4

Wishful Query Selection: Selecting from the k-Response Query Set

For typical settings in which the agent can query its user, the agent is restricted to asking a query from some specified set, such as the set of action-queries in the context of sequential decision-making considered in Chapter3. Intuitively, the impact of asking queries in terms of improving the agent’s performance partially depends on the usefulness of the set of queries it has access to. A natural question to ask, then, is how to determine when the agent’s query set is expressive enough, i.e., how can the agent’s designer know whether or not it would be useful to consider adding more queries to the agent’s query set? One way to formally pose this question is as a query selection problem, where the agent can ask any k-response query (a query with k possible responses, with k fixed to some integer

≥ 2): can the agent consider only those queries contained in some subset of the set of all k-response queries without missing the most valuable k-response query? If so, that subset should be considered “expressive enough.”

To this end, in this chapter I study the question of what the agent should ask its user in order to maximize EVOI when the only restriction on what the agent can ask is that it can only have k possible responses (where k ≥ 2), in an abstract setting in which a decision-making agent must query before decision-making its “decision” when (1) the agent may ask only a single query (myopic query selection); and (2) the agent may ask n queries (nonmyopic query selection).

I show that, in myopic settings, where the goal is to select a query so as to maximize EVOI without considering any future queries that could be asked, the set of k-response decision queries, each of which ask for the best decision out of some subset, is sufficiently general in that there is no benefit in considering any k-response queries beyond k-response decision queries. This result dovetails with recent work byViappiani and Boutilier(2010), who contribute efficient approximate algorithms for k-response decision query selection that exploit the submodularity of a lower-bound for decision query EVOI.

In addition, I consider a nonmyopic setting where the goal is to select a depth-n k-response query tree instead of a single query. Here I show that the set of depth-n trees constructed from k-response decision queries is not sufficiently general, in that there are cases where more valuable n k-response query trees exist outside the set of depth-n k-respodepth-nse decisiodepth-n query trees. However, I also show that the set of depth-depth-n trees constructed from k-response decision-set queries is sufficiently general. Finally, I show the computational result that depth-n k-response query tree selection can be reduced to kn-response decision query selection, where the algorithms contributed by Viappiani and Boutilier(2010) directly apply.

The results developed in this chapter provide guidance for how queries can be designed to be as informative as possible. In particular, they provide theoretical justification for using only decision/decision-set queries (in the myopic/nonmyopic settings) when these ideal types of queries can be asked. Moreoever, as I will show in Chapter5the results of this chapter for the myopic setting provide a solution to the wishful query selection problem proposed in Chapter 1, and hence play a crucial role in implementing the Wishful Query Projection (WQP) approach for query selection developed in this dissertation.