• No results found

7.2 Future Work

7.2.3 EVOI-Sufficiency

In Chapter 4 I studied the problem of how the agent can narrow the space of queries it should consider from the general k-response query set to a subset, when the goal is to select a query on the basis of EVOI only. First I considered a myopic setting where the agent can only ask a single response query, and proved that the agent can restrict attention to k-response decision queries at no EVOI-loss, i.e., the set of k-k-response decision queries is EVOI-sufficient. Then I considered a nonmyopic setting where the agent can dynamically select a sequence of n k-response queries to ask, or equivalently a depth-n k-response query tree, and proved that the agent cannot consider only those query trees composed entirely of k-response decision queries without risking EVOI-loss. I also proved that the agent canconsider only those query trees composed entirely of k-response decision-set queries without risking EVOI-loss. However, there exist a variety of open questions on the topic of EVOI-sufficiency, and I discuss some of them below.

7.2.3.1 EVOI-Necessity

A natural question to ask in the myopic setting is whether the agent needs to consider all k-response decision queries to assuredly find the k-response query with maximum EVOI, provided no additional assumptions are made about the structure of the decision value function or the agent’s uncertainty ψ. If the answer to this question turned out to be no, then the agent could consider a different k-response query set than decision queries, which might be easier to search and/or easier to convey to humans. However, I conjecture that the answer to this question is yes, and that it can be proven by showing that for any arbitrary k-response decision query d, there exists a decision set, model space, and form of uncertainty where d has strictly higher EVOI than any other k-response query.

Another question is whether depth-n k-response decision-set query trees are necessary in the same sense just described, but for the nonmyopic setting. I conjecture that the answer

to this question is yes as well, and that a similar proof technique to the one described above could be used to prove the claim.

7.2.3.2 EVOI-Sufficiency in Structured Settings

One or more of the agent’s decision set, model space, or form of uncertainty may have potentially exploitable structure. As an example, the model space Ω could be over a space of weight vectors, and the decision value function for a given ω ∈ Ω and decision u could take the form of the weighted combination (prescribed by ω) of features of u. In such a case, are there other EVOI-sufficient k-response query sets besides the ones proven to be EVOI-sufficient in their respective settings in Chapter 4? What about for other types of structure?

7.2.3.3 EVOI-Sufficiency for Selecting Query Sets

Consider a setting where the agent needs to select a set consisting of m k-response queries that it will select from, given the agent has limited knowledge of the uncertain decision problem that it will face. For concreteness, assume that the agent begins with a prior ξ over a set Ψ of possible priors ψ each supported by model space Ω, so that the agent explicily represents its uncertainty over the uncertain decision problem it will face. Also, assume that upon selecting a query set Q, the agent immediately observes the uncertain decision problem and can ask a single query from the query set Q it selected before it must make its decision (i.e., after choosing its query set the agent enters the myopic query selection setting). Which subset of size m of k-response queries should the agent consider as its query set, with the goal of maximizing the expected EVOI of the query it will ask? Solving this query set selection problem could be useful in situations where implementing human-agent interfaces for asking and answering queries is expensive and needs to be limited, or could be useful as a way to save computation in query selection by reducing the set of queries the agent considers.

Questions of EVOI-sufficiency extend to such a setting. Namely, can the agent restrict attention to subsets of k-response decision queries without risk of expected EVOI-loss?

The answer would certainly be yes if m ≥ |Dk|, since that would allow the agent to select any superset of Dk as its query set, which in turn would guarantee that the agent will be able to select the EVOI-optimal k-response query for any of the possible uncertain decision problems it turns out to face. Similarly, the answer would be yes if m ≥ |Ψ|, since the agent could select the set of decision queries comprised of each of the EVOI-optimal decision queries for each ψ ∈ Ψ. For the general case, however, the answer is unclear. If the answer

is no, the natural question to ask next would be whether the agent can restrict attention to subsets of k-response decision-set queries without risk of expected EVOI-loss.

BIBLIOGRAPHY

Ali E. Abbas. Entropy methods for adaptive utility elicitation. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 34(2):169–178, 2004.

Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse reinforcement learn-ing. In Proceedings of the Twenty-First International Machine Learning Conference (ICML), 2004.

Valentina Bayer-Zubek. Learning diagnostic policies from examples by systematic search.

In Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI), pages 27–34, 2004.

Richard Bellman. A markovian decision process. Indiana Univ. Math. J., 6:679–684, 1957.

ISSN 0022-2518.

Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scien-tific, 1st edition, 1996. ISBN 1886529108.

Edwin V. Bonilla, Shengbo Guo, and Scott Sanner. Gaussian process preference elicita-tion. In Proceedings of the Twenty-Fourth Conference on Neural Information Processing Systems (NIPS), pages 262–270. 2010a.

Edwin V. Bonilla, Shengbo Guo, and Scott Sanner. Gaussian process preference elicita-tion. In Proceedings of the Twenty-Fourth Conference on Neural Information Processing Systems (NIPS), pages 262–270, 2010b.

Craig Boutilier, Richard S. Zemel, and Benjamin Marlin. Active collaborative filtering.

In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI), pages 98–106, 2003.

Darius Braziunas. Minimax regret based elicitation of generalized additive utilities. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI), 2007.

Darius Braziunas and Craig Boutilier. Local utility elicitation in GAI models. In Pro-ceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI), pages 42–49, 2005.

Darius Braziunas and Craig Boutilier. Elicitation of factored utilities. AI Magazine, 29(4):

79–92, 2008.

Urszula Chajewska, Daphne Koller, and Ronald Parr. Making rational decisions using adaptive utility elicitation. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI), pages 363–369, 2000.

Sonia Chernova and Manuela Veloso. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research (JAIR), 34:1–25, 2009.

David A. Cohen, Martin C. Cooper, Peter G. Jeavons, and Andrei A. Krokhin. The com-plexity of soft constraint satisfaction. Artificial Intelligence, 170(11):983 – 1016, 2006.

ISSN 0004-3702.

David A. Cohn, Zoubin Ghahramani, and Michael I. Jordan. Active learning with statistical models. Journal Of Artificial Intelligence Research (JAIR), 4:129–145, 1996.

Robert Cohn, Edmund Durfee, and Satinder Singh. Comparing action-query strategies in semi-autonomous agents. In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI), pages 1102–1107, 2011.

Robert Cohn, Satinder P. Singh, and Edmund H. Durfee. Characterizing EVOI-sufficient k-response query sets in decision problems. In Proceedings of the Seventeenth Inter-national Conference on Artificial Intelligence and Statistics (AISTATS), pages 131–139, 2014.

Sanjoy Dasgupta. Analysis of a greedy active learning strategy. In L.K. Saul, Y. Weiss, and L. Bottou, editors, Proceedings of the Seventeenth Conference on Neural Information Processing Systems (NIPS), pages 337–344. MIT Press, 2004.

Dearden, Friedman, and Andre. Model based Bayesian exploration. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI), pages 150–159, 1999.

Søren L. Dittmer and Finn V. Jensen. Myopic value of information in influence diagrams.

In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI), pages 142–149, 1997.

Michael O. Duff. Design for an optimal probe. In ICML, pages 131–138, 2003.

M. E. Dyer and A. M. Frieze. On the complexity of computing the volume of a polyhedron.

SIAM Journal on Computing, 17(5):967–974, 1988.

Robert M. Fano and W. T. Wintringham. Transmission of information. Physics Today, 14:

56, 1961.

Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research (JAIR), 42:427–486, 2011.

Shengbo Guo and Scott Sanner. Real-time multiattribute Bayesian preference elicitation with pairwise comparison queries. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.

Kshitij Judah, Alan Fern, and Thomas G. Dietterich. Active imitation learning via reduction to I.I.D. active learning. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI), pages 428–437, 2012.

Andreas Krause and Carlos Guestrin. Near-optimal value of information in graphical mod-els. In Conference on Uncertainty in Artificial Intelligence (UAI), July 2005.

Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In National Conference on Artificial Intelligence (AAAI), Nectar track, July 2007.

Andreas Krause, Jure Leskovec, Carlos Guestrin, Jeanne VanBriesen, and Christos Falout-sos. Efficient sensor placement optimization for securing large water distribution networks. Journal of Water Resources Planning and Management, 134(6):516–526, November 2008.

Marek Kuczma. An Introduction to the Theory of Functional Equations and Inequalities:

Cauchy’s Equation and Jensen’s Inequality. Springer Science & Business Media, 2009.

Manuel Lopes, Francisco Melo, and Luis Montesano. Active learning for reward estima-tion in inverse reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 31–46, 2009.

Andrew McCallum and Kamal Nigam. Employing EM and pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning, ICML ’98, pages 350–358, 1998.

Francisco S. Melo and Manuel Lopes. Multi-class generalized binary search for active inverse reinforcement learning. CoRR, abs/1301.5488, 2013.

G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. An analysis of approximations for maxi-mizing submodular set functions–I. Mathematical Programming, 14(1):265–294, 1978.

Robert D. Nowak. The geometry of generalized binary search. IEEE Transactions on Information Theory, 57(12):7893–7906, 2011.

Joelle Pineau, Geoffrey Gordon, and Sebastian Thrun. Anytime point-based approxima-tions for large POMDPs. Journal Of Artificial Intelligence Research (JAIR), 2006.

Deepak Ramachandran and Eyal Amir. Bayesian inverse reinforcement learning. In Pro-ceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJ-CAI), pages 2586–2591, 2007.

Burr Settles. Active learning literature survey. Technical Report 1648, University of Wis-consin – Madison, 2009.

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.

Paolo Viappiani and Craig Boutilier. Optimal Bayesian recommendation sets and myopi-cally optimal choice query sets. In Proceedings of the Twenty-Fourth Conference on Neural Information Processing Systems (NIPS), pages 2352–2360, 2010.

Aaron Wilson, Alan Fern, and Prasad Tadepalli. A Bayesian approach for policy learning from trajectory preference queries. In Proceedings of the Twenty-Sixth Annual Confer-ence on Neural Information Processing Systems (NIPS), pages 1142–1150, 2012.

Tong Zhang and Frank J. Oles. A probability analysis on the value of unlabeled data for classification problems. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.