• No results found

Algorithm 3 A PPLICATION OF ρ CMAC FROM TABULAR FUNCTION APPROXIMATION

10. Conclusions

This article describes the implementation and results from learning Keepaway with Sarsa, a stan- dard TD method, and three different function approximators. We introduce the transfer via inter-task mapping method for speeding up reinforcement learning and give empirical evidence in the Keep- away domain of its usefulness. Rather than utilizing abstract knowledge, this transfer method is able to leverage the weights from function approximators specifying action-value functions, a very task-specific form of knowledge. We first give formulations of how to define transfer functionals for the different function approximators, or re-use learned weights via Q-value Reuse, from a single pair of inter-task mappings. We proceed to show that agents using all three function approximation methods can learn to reach a target performance faster in the target task. Additionally, we show that the total training time can be reduced usingTVITMwhen compared to simply learning the final task without transfer.

We give further evidence that TVITMis useful for speeding up learning by utilizing the 5 vs. 4 Keepaway task, which suggests that this method will scale up to even more complex problems. We have shown that theTVITMmethod is robust to some changes in the transition function, such as when the effectiveness of actuators in the two tasks differ. This flexibility may prove critical when transferring behavior between agents situated in the real world, where environmental conditions may cause sensors and actuators to have different behaviors at different times.

We introduce a novel variant of Knight Joust, a gridworld task, and demonstrate that transfer between it and Keepaway is effective despite substantial qualitative differences in the two tasks. We also show how transfer efficacy is reduced when the source task and target task are less related, such as when using 3 vs. 2 Flat Reward or 3 vs. 2 Giveaway as source tasks.

When considered as a whole, the experiments presented in this article establish thatTVITMcan be used successfully for transferring action-value functions between tasks and reducing training time. This article therefore constitutes a first step towards a fully general and autonomous transfer method within the RL framework.

Acknowledgments

We would like to thank Gregory Kuhlmann for his help with Keepaway experiments described in this article, Cynthia Matuszek and Shimon Whiteson for useful discussions, and the anonymous reviewers for their detailed and constructed comments. This research was supported in part by NSF CAREER award IIS-0237699, NSF award EIA-0303609, and DARPA grant HR0011-04-1-0035.

References

James S. Albus. Brains, Behavior, and Robotics. Byte Books, Peterborough, NH, 1981.

David Andre and Stuart J. Russell. State abstraction for programmable reinforcement learning agents. In Proc. of the Eighteenth National Conference on Artificial Intelligence, pages 119–125, 2002.

David Andre and Astro Teller. Evolving team Darwin United. In Minoru Asada and Hiroaki Kitano, editors, RoboCup-98: Robot Soccer World Cup II, pages 346–351. Springer Verlag, Berlin, 1999. Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda. Vision-based behavior ac- quisition for a shooting robot by using a reinforcement learning. In Proc. of IAPR/IEEE Workshop on Visual Behaviors-1994, pages 112–118, 1994.

Steven J. Bradtke and Michael O. Duff. Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neu- ral Information Processing Systems, volume 7, pages 393–400, San Mateo, CA, 1995. Morgan Kaufmann.

Mao Chen, Ehsan Foroughi, Fredrik Heintz, Spiros Kapetanakis, Kostas Kostiadis, Johan Kummeneje, Itsuki Noda, Oliver Obst, Patrick Riley, Timo Steffens, Yi Wang, and Xiang Yin. Users manual: RoboCup soccer server manual for soccer server version 7.07 and later, 2003. Available athttp://sourceforge.net/projects/sserver/.

Marco Colombetti and Marco Dorigo. Robot Shaping: Developing Situated Agents through Learn- ing. Technical Report TR-92-040, International Computer Science Institute, Berkeley, CA, 1993. Robert H. Crites and Andrew G. Barto. Improving elevator performance using reinforcement learn- ing. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Informa- tion Processing Systems 8, pages 1017–1023, Cambridge, MA, 1996. MIT Press.

Chris Drummond. Accelerating reinforcement learning by composing solutions of automatically identified subtasks. Journal of Artificial Intelligence Research, 16:59–104, 2002.

Alan Fern, Sungwook Yoon, and Robert Givan. Approximate policy iteration with a policy language bias. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch ¨olkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.

Fernando Fernandez and Manuela Veloso. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the 5th International Conference on Autonomous Agents and Multiagent Systems, pages 720–727, 2006.

Mary L. Gick and Keith J. Holyoak. Analogical problem-solving. Cognitive Psychology, 12:306– 355, 1980.

Carlos Guestrin, Daphne Koller, Chris Gearhart, and Neal Kanodia. Generalizing plans to new environments in relational mdps. In International Joint Conference on Artificial Intelligence (IJCAI-03), Acapulco, Mexico, August 2003.

George Konidaris and Andrew Barto. Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning, pages 489– 496, 2006.

Yaxin Liu and Peter Stone. Value-function-based transfer for reinforcement learning using structure mapping. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 415–20, July 2006.

Maja J. Mataric. Reward functions for accelerated learning. In International Conference on Machine Learning, pages 181–189, 1994.

Kishan Mehrotra, Chilukuri K. Mohan, and Sanjay Ranka. Elements of Artificial Neural Networks. MIT Press, Cambridge, MA, USA, 1997. ISBN 0-262-13328-8.

Itsuki Noda, Hitoshi Matsubara, Kazuo Hiraki, and Ian Frank. Soccer server: A tool for research on multiagent systems. Applied Artificial Intelligence, 12:233–250, 1998.

Bob Price and Craig Boutilier. Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research, 19:569–629, 2003.

Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994. ISBN 0471619779.

Martin Riedmiller, Author Merke, David Meier, Andreas Hoffman, Alex Sinner, Ortwin Thate, and Ralf Ehrmann. Karlsruhe brainstormers—a reinforcement learning approach to robotic soccer. In Peter Stone, Tucker Balch, and Gerhard Kraetszchmar, editors, RoboCup-2000: Robot Soccer World Cup IV, pages 367–372. Springer Verlag, Berlin, 2001.

Gavin Rummery and Mahesan Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG-RT 116, Engineering Department, Cambridge University, 1994. Oliver G. Selfridge, Richard S. Sutton, and Andrew G. Barto. Training and tracking in robotics.

In Proceedings of the Ninth International Joint Conference on Artificial Intelligence, pages 670– 672, 1985.

Satinder P. Singh. Transfer of learning by composing solutions of elemental sequential tasks. Ma- chine Learning, 8:323–339, 1992.

Satinder P. Singh and Richard S. Sutton. Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123–158, 1996.

Burrhus F. Skinner. Science and Human Behavior. Colliler-Macmillian, 1953. ISBN 0029290406. Vishal Soni and Satinder Singh. Using homomorphisms to transfer options across continuous rein- forcement learning domains. In Proceedings of the Twenty First National Conference on Artificial Intelligence, July 2006.

Peter Stone and Richard S. Sutton. Keepaway soccer: a machine learning testbed. In Andreas Birk, Silvia Coradeschi, and Satoshi Tadokoro, editors, RoboCup-2001: Robot Soccer World Cup V, volume 2377 of Lecture Notes in Artificial Intelligence, pages 214–223. Springer Verlag, Berlin, 2002.

Peter Stone, Richard S. Sutton, and Gregory Kuhlmann. Reinforcement learning for RoboCup- soccer keepaway. Adaptive Behavior, 13(3):165–188, 2005.

Peter Stone, Gregory Kuhlmann, Matthew E. Taylor, and Yaxin Liu. Keepaway soccer: From machine learning testbed to benchmark. In Itsuki Noda, Adam Jacoff, Ansgar Bredenfeld, and Yasutake Takahashi, editors, RoboCup-2005: Robot Soccer World Cup IX, volume 4020, pages 93–105. Springer Verlag, Berlin, 2006.

Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, 1998. ISBN 0262193981.

Erik Talvitie and Satinder Singh. An experts algorithm for transfer learning. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, 2007.

Matthew E. Taylor and Peter Stone. Behavior transfer for value-function-based reinforcement learn- ing. In Frank Dignum, Virginia Dignum, Sven Koenig, Sarit Kraus, Munindar P. Singh, and Michael Wooldridge, editors, The Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, pages 53–59, New York, NY, July 2005. ACM Press.

Matthew E. Taylor and Peter Stone. Cross-domain transfer for reinforcement learning. In Proceed- ings of the Twenty-Fourth International Conference on Machine Learning, June 2007.

Matthew E. Taylor, Peter Stone, and Yaxin Liu. Value functions for RL-based behavior transfer: A comparative study. In Proceedings of the Twentieth National Conference on Artificial Intelli- gence, July 2005.

Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Transfer via inter-task mappings in pol- icy search reinforcement learning. In The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, May 2007.

Gerald Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215–219, 1994.

Lisa Torrey, Trevor Walker, Jude Shavlik, and Richard Maclin. Using advice to transfer knowledge acquired in one reinforcement learning task to another. In Proceedings of the Sixteenth European Conference on Machine Learning, 2005.

Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Cam- bridge, UK, 1989.

Aaron Wilson, Alan Fern, Soumya Ray, and Prasad Tadepalli. Multi-task reinforcement learning: a hierarchical bayesian approach. In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages 1015–1022, New York, NY, USA, 2007. ACM Press.

Related documents