CHAPTER 5 : Few-Shot Image Classification through Coupled Embedding Spaces
6.8 Additional Experiments
6.8.3 Performance for Various Numbers of Tasks
Although we have shown in Section 6.5.2 that the learned dictionaries become more stable as the system learns more tasks, we cannot currently guarantee that this will improve the performance of zero-shot transfer. To evaluate the effect of the number of tasks on zero-shot performance, we conducted an additional set of experiments on both the Simple-Mass domain and the robot end
effector prediction domain. Our results, shown in Figure 31, reveal that zero-shot performance does indeed improve as the dictionaries are trained over more tasks. This improvement is most stable and rapid in an MTL setting since the optimization over all dictionaries and task policies is run to convergence, but TaDeLL also shows definite improvement in zero-shot performance asTmax increases. Since zero-shot transfer involves only the learned coupled dictionaries, we can conclude that the quality of these dictionaries for zero-shot transfer improves as the system learns more tasks.
5 10 15 20 25 30 35 Tasks -4.5 -4.0 -3.5 -3.0 -2.5 -2.0
Reward TaDeLLTaDeLL Zero-Shot
(a) Spring-Mass RL 25 50 75 100 125 150 175 200 Tasks 0.10 0.15 0.20 0.25 0.30 0.35 Loss TaDeLL Zero-Shot TaDeLL
(b) Robot End Effector Prediction
Figure 31: Zero-shot performance as a function of the number of tasks used to train the dictionary. As more tasks are used, the performance of zero-shot transfer improves.
6.9. Conclusions
We demonstrated that incorporating high-level task descriptors into lifelong learning both improves learning performance and also enables zero-shot transfer to new tasks. The mechanism of using a coupled dictionary to connect the task descriptors with the learned models is relatively straightforward, yet highly effective in practice. Most critically, it provides a fast and simple mechanism to predict the model or policy for a new task via zero-shot learning, given only its high-level task descriptor. This approach is general and can handle multiple learning paradigms, including classification, regression, and RL tasks. Experiments demonstrate that our approach outperforms state of the art and requires substantially less computational time than competing methods.
This ability to rapidly bootstrap models (or policies) for new tasks is critical to the development of lifelong learning systems that will be deployed for extended periods in real environments and tasked
with handling a variety of tasks. High-level descriptions provide an effective way for humans to communicate and to instruct each other. The description need not come from another agent; humans often read instructions and then complete a novel task quite effectively. Enabling lifelong learning systems to take advantage of these high-level descriptions provides an effective step toward their practical effectiveness. As shown in our experiments with warm-start learning from the zero-shot predicted policy, these task descriptors can also be combined with training data on the new task in a hybrid approach. Also, while our framework is designed to work for tasks that are drawn from a single domain, an exciting potential direction for future is to extend this work for cross-domain tasks, e.g., balancing tasks of bicycle and spring-mass systems together.
Despite TaDeLL’s strong performance, defining what constitutes an effective task descriptor for a group of related tasks remains an open question. In our framework, task descriptors are given, typically as fundamental descriptions of the system. The representation we use for the task descriptors, a feature vector, is also relatively simple. One interesting direction for future work is to develop methods for integrating more complex task descriptors into MTL or lifelong learning. These more sophisticated mechanisms could include natural language descriptions, step-by-step instructions, or logical relationships. Such an advance would likely involve moving beyond the linear framework used in TaDeLL but would constitute an important step toward enabling more practical use of high-level task descriptors in lifelong learning.
In the next chapter, we focus on addressing the challenge ofcatastrophic forgettingin the continual learning setting. Catastrophic forgetting is a phenomenon in machine learning when a model forgets the previously learned tasks when new tasks are learned. In this chapter, tackling catastrophic forgetting is not challenging. The reason is that as more tasks are learned, a better dictionary is learned. To avoid catastrophic forgetting, we can storeΓ(t)andα(t)and update the estimate for
the sparse vector and subsequently the optimal parameter for each learned task using Eq. (6.2). This is possible because the optimal parameters are task-specific. When nonlinear models such as deep neural networks are used as base models, tackling catastrophic forgetting is more challenging because optimal parameters for all the tasks are captures through the weights of the network. These
parameters are shared across the tasks, which causes interference. In the next chapter, we will focus on addressing this challenge by coupling the tasks by mapping the tasks into a task-invariant embedding space. Our goal will be to train a model such that the distributions of a number of tasks become similar in a shared embedding space to learn sequential tasks without forgetting.