We have presented a framework for transfer in reinforcement learning based on the idea that related tasks share common features and that transfer can take place through functions defined over those related features. The framework attempts to capture the notion of tasks that are related but distinct, and it provides some insight into when transfer can be usefully applied to a problem sequence and when it cannot.
Most prior work on transfer relies on mappings between pairs of tasks, and therefore implicitly defines transfer as a relationship between problems. This work provides a contrasting viewpoint by relying on a stronger notion of an agent: that there is something common across a series of tasks faced by the same agent, and that that commonality derives from features shared because they are a property of an agent.
We have empirically demonstrated that this framework can be successfully applied to signif- icantly improve transfer using both knowledge transfer and skill transfer. It provides a practical approach to building agents that are capable of improving their own problem-solving capabilities through experience over multiple problems.
Acknowledgments
We would like to thank Ron Parr and our three anonymous reviewers for their tireless efforts to improve this work. We also thank Gillian Hayes, Colin Barringer, Sarah Osentoski, ¨Ozg¨ur S¸ims¸ek, Michael Littman, Aron Culotta, Ashvin Shah, Chris Vigorito, Kim Ferguson, Andrew Stout, Khashayar Rohanimanesh, Pippin Wolfe and Gene Novark for their comments and assis- tance. Andrew Barto and George Konidaris were supported in part by the National Science Foun- dation under Grant No. CCF-0432143, and Andrew Barto was supported in part by a subcontract from Rutgers University, Computer Science Department, under award number HR0011-04-1-0050 from DARPA. George Konidaris was supported in part by the AFOSR under grant AOARD-104135
and the Singapore Ministry of Education under a grant to the Singapore-MIT International Design Center. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, DARPA, the AFOSR, or the Singapore Ministry of Education.
References
P.E. Agre and D. Chapman. Pengi: An implementation of a theory of activity. In Proceedings of
the Sixth National Conference on Artificial Intelligence, pages 268–272, 1987.
B. Banerjee and P. Stone. General game learning using knowledge transfer. In Proceedings of the
20th International Joint Conference on Artificial Intelligence, pages 672–677, 2007.
A.G. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete
Event Dynamic Systems, 13:41–77, 2003. Special Issue on Reinforcement Learning.
D.S. Bernstein. Reusing old policies to accelerate learning on new MDPs. Technical Report UM- CS-1999-026, Department of Computer Science, University of Massachusetts at Amherst, April 1999.
J. Boyan and A.W. Moore. Learning evaluation functions to improve optimization by local search.
Journal of Machine Learning Research, 1:77–112, 2000.
T. Croonenborghs, K. Driessens, and M. Bruynooghe. Learning relational options for inductive transfer in relational reinforcement learning. In Proceedings of the Seventeenth International
Conference on Inductive Logic Programming, pages 88–97, 2007.
T.G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition.
Journal of Artificial Intelligence Research, 13:227–303, 2000.
B.L. Digney. Learning hierarchical control structures for multiple tasks and changing environments. In R. Pfeifer, B. Blumberg, J. Meyer, and S.W. Wilson, editors, From Animals to Animats 5:
Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, Zurich,
Switzerland, August 1998. MIT Press.
M. Dorigo and M. Colombetti. Robot Shaping: An Experiment in Behavior Engineering. MIT Press/Bradford Books, 1998.
K. Ferguson and S. Mahadevan. Proto-transfer learning in Markov Decision Processes using spectral methods. In Proceedings of the ICML Workshop on Structural Knowledge Transfer for Machine
Learning, Pittsburgh, June 2006.
K. Ferguson and S. Mahadevan. Proto-transfer learning in Markov Decision Processes using spectral methods. Technical Report TR-08-23, University of Massachusetts Amherst, 2008.
F. Fern´andez and M. Veloso. Probabilistic policy reuse in a reinforcement learning agent. In Pro-
ceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Sys- tems, pages 720–727, 2006.
E. Ferrante, A. Lazaric, and M. Restelli. Transfer of task representation in reinforcement learning using policy-based proto-value functions (short paper). In Proceedings of the Seventh Interna-
tional Conference on Autonomous Agents and Multiagent Systems, pages 1329–1332, 2008.
L. Frommberger. Learning to behave in space: A qualitative spatial representation for robot navi- gation with reinforcement learning. International Journal on Artificial Intelligence Tools, 17(3): 465–482, 2008.
A. Guazzelli, F.J. Corbacho, M. Bota, and M.A. Arbib. Affordances, motivations, and the world graph theory. Adaptive Behavior, 6(3/4):433–471, 1998.
B. Hengst. Discovering hierarchy in reinforcement learning with HEXQ. In Proceedings of the
Nineteenth International Conference on Machine Learning, pages 243–250, 2002.
A. Jonsson and A.G. Barto. Automated state abstraction for options using the U-Tree algorithm. In
Advances in Neural Information Processing Systems 13, pages 1054–1060, 2001.
A. Jonsson and A.G. Barto. A causal approach to hierarchical decomposition of factored MDPs. In
Proceedings of the Twenty Second International Conference on Machine Learning, 2005.
G.D. Konidaris and G.M. Hayes. Estimating future reward in reinforcement learning animats us- ing associative learning. In From Animals to Animats 8: Proceedings of the 8th International
Conference on the Simulation of Adaptive Behavior, pages 297–304, July 2004.
A. Laud and G. DeJong. The influence of reward on the speed of reinforcement learning: an analysis of shaping. In Proceedings of the Twentieth International Conference on Machine Learning, pages 440–447, 2003.
A. Lazaric, M. Restelli, and A. Bonarini. Transfer of samples in batch reinforcement learning. In
Proceedings of the Twenty-Fifth International Conference on Machine Learning, pages 544–551,
2008.
S. Mannor, I. Menache, A. Hoze, and U. Klein. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the Twenty First International Conference on Machine Learning, pages 560–567, 2004.
M.J. Matari´c. Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1):73–83, 1997.
A. McGovern and A.G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 361–368, 2001.
N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich. Automatic discovery and transfer of MAXQ hierarchies. In Proceedings of the Twenty Fifth International Conference on Machine Learning, 2008.
A.W. Moore and C.G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1):103–130, 1993.
A.Y. Ng, D. Harada, and S. Russell. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine
Learning, pages 278–287, 1999.
R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural
Information Processing Systems 10, pages 1043–1049, 1997.
T.J. Perkins and D. Precup. Using options for knowledge transfer in reinforcement learning. Tech- nical Report UM-CS-1999-034, Department of Computer Science, University of Massachusetts, Amherst, 1999.
M. Pickett and A.G. Barto. Policyblocks: An algorithm for creating useful macro-actions in re- inforcement learning. In Proceedings of the Nineteenth International Conference of Machine
Learning, pages 506–513, 2002.
D. Precup, R.S. Sutton, and S. Singh. Eligibility traces for off-policy policy evaluation. In Proceed-
ings of the Seventeenth International Conference on Machine Learning, pages 759–766, 2000.
M.L. Puterman. Markov Decision Processes. Wiley, 1994.
J. Randløv and P. Alstrøm. Learning to drive a bicycle using reinforcement learning and shaping. In
Proceedings of the 15th International Conference on Machine Learning, pages 463–471, 1998.
B. Ravindran and A.G. Barto. Relativized options: Choosing the right transformation. In Proceed-
ings of the Twentieth International Conference on Machine Learning, pages 608–615, 2003a.
B. Ravindran and A.G. Barto. SMDP homomorphisms: An algebraic approach to abstraction in semi markov decision processes. In Proceedings of the Eighteenth International Joint Conference
on Artificial Intelligence, pages 1011–1016, 2003b.
O. Selfridge, R. S. Sutton, and A. G. Barto. Training and tracking in robotics. In Proceedings of the
Ninth International Joint Conference on Artificial Intelligence, pages 670–672, 1985.
¨
O. S¸ims¸ek and A. G. Barto. Using relative novelty to identify useful temporal abstractions in re- inforcement learning. In Proceedings of the Twenty-First International Conference on Machine
Learning, pages 751–758, 2004.
¨
O. S¸ims¸ek, A. P. Wolfe, and A. G. Barto. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the Twenty-Second International Conference on
Machine Learning, 2005.
S. Singh, A.G. Barto, and N. Chentanez. Intrinsically motivated reinforcement learning. In Pro-
ceedings of the 18th Annual Conference on Neural Information Processing Systems, 2004.
B. F. Skinner. The Behavior of Organisms: An Experimental Analysis. Appleton-Century-Crofts, New York, 1938.
M. Snel and S. Whiteson. Multi-task evolutionary shaping without pre-specified representations. In
P. Stone, R.S. Sutton, and G. Kuhlmann. Reinforcement learning for robocup soccer keepaway.
Adaptive Behavior, 13(3):165–188, 2005.
R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
R.S. Sutton, D. Precup, and S.P. Singh. Intra-option learning about temporally abstract actions. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 556–564, 1998.
R.S. Sutton, D. Precup, and S.P. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.
M.E. Taylor and P. Stone. Value functions for RL-based behavior transfer: a comparative study. In
Proceedings of the Twentieth National Conference on Artificial Intelligence, 2005.
M.E. Taylor, P. Stone, and Y. Liu. Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research, 8:2125–2167, 2007.
M.E. Taylor, G. Kuhlmann, and P. Stone. Autonomous transfer for reinforcement learning. In Pro-
ceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems,
2008.
S. Thrun and A. Schwartz. Finding structure in reinforcement learning. In Advances in Neural
Information Processing Systems, volume 7, pages 385–392. The MIT Press, 1995.
L. Torrey, J. Shavlik, T. Walker, and R. Maclin. Skill acquisition via transfer learning and advice taking. In Proceedings of the Seventeenth European Conference on Machine Learning, pages 425–436, 2006.
J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and E. Thelen. Autonomous mental development by robots and animals. Science, 291(5504):599–600, 2000.
E. Wiewiora. Potential-based shaping and Q-value initialization are equivalent. Journal of Artificial
Intelligence Research, 19:205–208, 2003.
E. Wiewiora, G. Cottrell, and C. Elkan. Principled methods for advising reinforcement learning agents. In Proceedings of the Twentieth International Conference on Machine Learning, pages 792–799, 2003.
A. Wilson, A. Fern, S. Ray, and P. Tadepalli. Multi-task reinforcement learning: a hierarchical bayesian approach. In Proceedings of the 24th International Conference on Machine Learning, pages 1015–1022, 2007.
W. Zhang and T.G. Dietterich. A reinforcement learning approach to job-shop scheduling. In
Proceedings of the Fourteenth International Conference on Artificial Intelligence, pages 1114–