Performance for Various Numbers of Tasks

4.6 Additional Experiments

4.6.3 Performance for Various Numbers of Tasks

Although it was shown in Section 4.4.2 that the learned dictionaries become more stable as the system learns more tasks, it is not currently guaranteed that this will improve the performance of zero-shot transfer. An additional set of experiments was conducted on the Simple-Mass domain to evaluate the effect of the number of tasks on zero-shot performance. Our results, shown in Figure 4.9, reveal that zero-shot performance does indeed improve as the dictionaries are trained over more tasks. This improvement is most stable and rapid in an MTL setting, since the optimization over all dictionaries and task policies is run to convergence, but TaDeLL also shows clear

CHAPTER 4. PREDICTING POLICIES 77 involves only the learned coupled dictionaries, it can be concluded that the quality of these dictionaries for zero-shot transfer improves as the system learns more tasks.

5 10 15 20 25 30 35 Tasks -4.5 -4.0 -3.5 -3.0 -2.5 -2.0

Reward TaDeLLTaDeLL Zero-Shot

Figure 4.9: Zero-shot performance as a function of the number of spring mass tasks used to train the dictionary. As more tasks are used, the performance of zero-shot transfer improves.

4.7 Discussion

This chapter demonstrated how incorporating high-level task descriptors into lifelong learning both improves learning performance and also enables zero-shot transfer to new tasks. The mechanism of using a coupled dictionary to connect the task descriptors with the learned models is relatively straightforward, yet highly effective in practice. Critically, it provides a fast and simple mechanism to predict the model or policy for a new task via zero-shot learning, given only its high-level task de- scriptor. While the focus was on RL tasks here, this approach is general and can handle multiple learning paradigms, including classification and regression. Experi- ments demonstrate that our approach outperforms the state-of-the-art and requires

CHAPTER 4. PREDICTING POLICIES 78 substantially less computational time than competing methods.

The ability to rapidly bootstrap policies for new tasks is critical to the develop- ment of lifelong learning systems that will be deployed for extended periods in real environments and tasked with handling a variety of learning problems. High-level descriptions provide an effective way to relate different models. The description could be provided by another agent, or as in the case when a person reads an instruction manual, the description can come from information in the environment. Enabling lifelong learning systems to similarly take advantage of these high-level descriptions provides an effective step toward their practical effectiveness.

The two major limitations of TaDeLL are the restriction to linear policies and the additional need for task descriptions. In certain instances, a mechanism like a descrip-

tion is necessary for handling multiple tasks. Consider a situation when state and

action spaces are identical between tasks but the goal is different. The agent must rely on a description, known objective, or reward function to disambiguate tasks. How- ever, in domains with high-dimensional state representations, the uniqueness of the state space can often act as a description to orient the agent. The next chapter con- siders a learner in such a high-dimensional state space. To handle high-dimensional state spaces, and additionally to allow for more expressive models, I shift the focus to deep learning and investigate transfer and lifelong learning issues related to neural networks. This serves to address the shortcoming of the linear policies, while the need for task descriptions is left as an open problem.

Chapter 5 Fine-Tuning as Transfer in Deep

Learning

One of the biggest limitations of TaDeLL is the reliance on linear policies. For many problems, the relationship between the feature space and the desired output is highly non-linear. Chapter 6 presents an approach for learning multiple tasks in an online fashion using deep learning. To better frame that work, I use this chapter to present an overview of transfer learning in deep neural networks. Transfer is a main component of lifelong machine learning, and the issues related to transfer also appear in lifelong machine learning. By understanding the former, we can better understand the latter. In this chapter, I focus on properties of deep networks that are specific to transfer learning, and present issues that arise when conducting transfer with deep models.

An easy and effective way to transfer knowledge in deep networks is fine-tuning. Fine- tuning involves pre-training a network on one task and then retraining the network or fine-tuning it for a different task. Fine-tuning was one of the early successes of

CHAPTER 5. FINE-TUNING AS TRANSFER IN DEEP LEARNING 80 deep learning – it was shown high-level features learned from training a network on ImageNet (Deng et al., 2009) allowed for breakthrough results in other domains where only smaller image datasets were available (Girshick et al., 2014; Hoffman et al., 2014).

One hypothesis for the success of fine-tuning is that the increased training on related tasks produces internal representations in the network that generalize to a broader set of problems and are therefore able to reduce over-fitting (Razavian et al., 2014; Caruana, 1997). Considering this phenomenon more closely, one might ask, “Can the initial training process be over-specialized and actually hurt transfer performance?” This was shown to be the case. Yosinski et al. (2014) investigated fixing layers and showed that earlier layers appear to be more general, while later layers are more specific and require retraining. In this chapter, I will explore other properties of fine- tuning. Specifically, I show that a directly copied network that performs poorly on a new task can improve performance on that new task with fine-tuning. I demonstrate that the amount of training a network receives affects how well a network performs as an initialization, and I investigate retention or how well performance on the initial task is preserved.

The problem I will use to investigate issues related to transfer in deep learning is the problem of intersection handling by autonomous vehicles. I show that a network trained on one type of intersection generally is not able to generalize to other intersections. However, a network that is pre-trained on one intersection and fine- tuned on another intersection performs better on the new task compared to training in isolation. This network also retains knowledge of the prior task, even though some forgetting occurs. Finally, I show that the benefits of fine-tuning hold when

CHAPTER 5. FINE-TUNING AS TRANSFER IN DEEP LEARNING 81

Figure 5.1: We analyze knowledge transfer between different types of intersections. The knowledge to handle an intersection is represented as a Deep Q-Network (DQN). We investigate a) directly copying a network to a new intersection b) fine-tuning a previously trained network on a new intersection, c) whether fine tuning destroys old intersection knowledge in reverse transfer.

Autonomous Driving (AD) has the potential to reduce accidents caused by driver fatigue and distraction and will enable more active lifestyles for the elderly and dis- abled (Blanco et al., 2016). While AD technology has made important strides over the last couple of years, current technology is still not ready for large scale roll-out (Aoude et al., 2011). Urban environments are particularly difficult for AD, due to the unpredictable nature of pedestrians and vehicles in city traffic.

Rule-based methods provide a predictable method to handle intersections. However, rule-based intersection handling approaches do not scale well due to the difficulty of designing hand-crafted rules that remain valid as the diversity and complexity of possible scenes increase. Recently it has been shown that deep reinforcement learning can improve over rule-based techniques (Isele and Cosgun, 2017a; Isele et al., 2018b), but before such techniques can be utilized, it is important to understand how well these techniques are able to generalize to different scenarios and real systems.

We explore the ability of a reinforcement learning agent to generalize, focusing specifically on how the knowledge for one type of intersection, represented as a Deep Q- Network (DQN), translates to other types of intersections. We treat both different

CHAPTER 5. FINE-TUNING AS TRANSFER IN DEEP LEARNING 82 geometries and goal behaviors of intersections as different tasks for a lifelong learning

agent. First I look at direct copy: how well a network trained for Task A performs

on Task B. Second, I analyze how a network initialized on Task A and fine-tuned

on Task B compares to a randomly initialized network exclusively trained on Task

B. Third, I investigate reverse transfer: if a network pre-trained for Task A and

fine-tuned to Task B, preserves knowledge for Task A. Finally, I present early results of using a network trained in simulation to initialize learning on real data.

5.1 Background on Autonomous Driving and

Transfer in Deep Learning

Intersections require attending to many different agents in multiple different direc- tions, with highly varied and often unpredictable behaviors. Making appropriate decisions can be challenging even for human drivers: 20% of all accidents occur at intersections (National Highway Traffic Safety Administration, 2014). Robots have the potential to be safer, with some research suggesting they might already be safer (Blanco et al., 2016; Weiland and Crow, 2018). However, in practice, intersections are considered one of the most difficult problems for autonomous vehicles (Fuerstenberg, 2005; Aoude et al., 2011). According to a presentation by Pravin Varaiya at the 2018 Berkeley Deep Drive consortium (Pravin, 2018), 23 out of 26 reported autonomous driving accidents in California occurred at intersections.

Researchers have recently been investigating using machine learning techniques to control autonomous vehicles (Cosgun et al., 2017). Imitation learning strategies have investigated learning from a human driver (Bojarski et al., 2016). Markov Deci-

CHAPTER 5. FINE-TUNING AS TRANSFER IN DEEP LEARNING 83 sion Processes (MDP) have been used offline to address the problem of intersection handling (Brechtel et al., 2014; Song et al., 2016). Online planners based on par- tially observable Monte Carlo Planning (POMCP) have been applied to intersection problems when an accurate generative model is available (Bouton et al., 2017). Addi- tionally, machine learning techniques have been used to optimize comfort in a space where solutions are constrained to safe trajectories (Shalev-Shwartz et al., 2016).

Large amounts of data often improve the performance of machine learning techniques. In the absence of huge datasets, training on multiple related tasks gives similar performance gains (Caruana, 1997). A large breadth of research has investigated transferring knowledge from one system to another in machine learning in general (Pan and Yang, 2010), and in reinforcement learning specifically (Taylor and Stone, 2009).

Large training times and high sample complexity make transfer methods particularly appealing in deep networks (Razavian et al., 2014; Yosinski et al., 2014). Recent work in deep reinforcement learning has looked at combining networks from different

tasks to share information (Rusu et al., 2016; Yin and Pan, 2017). Researchers

have looked at using options (Sutton and Barto, 1998) in Deep RL to expand an agent’s capabilities (Jaderberg et al., 2016; Tessler et al., 2016; Kulkarni et al., 2016), and efforts have been made to enable a unified framework for learning multiple tasks through changes in architecture design (Srivastava et al., 2013) and modified objective functions (Kirkpatrick et al., 2016) to address the problem of catastrophic forgetting1

(Goodfellow et al., 2013).

In our scenario, there is the added difficulty of transferring to the real vehicle. It is a well-known problem that policies trained in simulation rarely work on real robots

CHAPTER 5. FINE-TUNING AS TRANSFER IN DEEP LEARNING 84 (Barrett et al., 2010). Recent work has investigated grounding imperfect simulators to a robot’s behavior (Hanna and Stone, 2017) and there is evidence that transferring from simulation to real robots can be addressed by targeting the system’s ability to generalize (Tobin et al., 2017). Given the variety of intersections, the importance of learning a general model, and the difficulty of training on the real vehicle, the prospects of multi-task learning are examined for the problem of intersection handling.

In document Lifelong Reinforcement Learning On Mobile Robots (Page 97-105)