• No results found

1.2 Continual Learning

1.2.6 Applications

While not the focus of this dissertation, for further motivating the practical interest in continual learning, a short review of the large number of domains in which continual learning could have an impact is here summarized. Applications accounting for streams of data (e.g. applications running on smart-phones devices) or any other kind of real-time ephemeral signal that results impractical to store and re-process are the ones which would benefit the most from the integration of CL features. A non-comprehensive and unordered list of applications in which continual learning may be beneficial or has been already applied con be found below:

• Computer Vision: given the high-dimensionality and high-velocity of visual information, computer vision tasks are one of more suitable domains to prove the importance of continual learning and to actually benefit from it also from a practical point of view. Object detection, recognition and segmentation [Shmelkov et al., 2017] are simple examples of horizontal applications which are in high need of more efficiency and scalability, often dealing with limited hardware resources (e.g. smart cameras) and with the necessity to customize and

Chapter 1. Introduction 22

adapt (possibly offline) their behaviors over time (e.g. for surveillance purposes or for providing better, customer-centered specialized services).

• Natural Language Processing and Speech Recognition: after a period of early enthusiasm and subsequent disappointment during the second AI winter, conversational agents (orchatbots) [Lee,2017] andvirtual assistants are slowing regaining ground in the AI applications landscape. Their latest incarnations in Siri, Alexa, Google Now, Cor- tana, etc. are showing today rapidly growing application scope and success [Omale,2019] mostly due to the recent improvements in speech recognition and natural language pro- cessing. Continual learning may substantially improve the human-to-machine interaction through efficient on-device personalization/adaptation. This may not only reduce the com- putational burden on the server side (and improve the adaptation speed), but given the highly personal nature of the information being processed by the virtual assistants, it may also force the raw data to never leave the device.

• Robotics: the robotics community has always been intrigued by endowing embodied ma- chines with lifelong and open-ended learning of new skills and new knowledge and many are the scenarios which would highly benefit by recent CL advances. Robotics applica- tions in unconstrained environments, indeed, have always posed questions out of reach for previous machine learning techniques often dealing with unpredictable situations. Classic continual learning setting include room navigation, e.g., using a HERO-2000 mobile robot with a radar sensor [Thrun,1996] to perform several room mapping and navigation tasks. Action models in Explanation-Based Neural Network (EBNN) learning explain (in terms of previous experiences) and analyze observations to transfer task-independent (naviga- tion) knowledge via predicting collisions and the prediction certainty. In the most recent literature, estimation and tracking in [Wong, 2016], odometry estimation, mask or pixel- wise segmentation in [Pinto and Gupta, 2016] have been also tacked, especially through self-supervision. However, most of these works were not conceived within the motivating principles of CL. RL Intelligent Adaptive Curiosity (RL-IAC) constitutes one of the few examples of the direct application of CL in a robotics setting for visual saliency learning

Craye et al.[2018]. However, the proposed algorithm does not employ deep architectures.

• Internet-Of-Things and Edge Computing: embedded devices with highly constrained hardware resources and operating off-line (due to privacy or operational reasons) may highly benefit the introduction of more efficient learning algorithm operating on real-time data and without the need of storing them. The domestic robot example introduced in the previous section, already gave some pragmatic motivations of the need of continual learning in this area. However, many are the vertical applications we could mention, like transportation-mode detection [Carpineti et al.,2018] andactivity recognition [Ravi et al.,

2005] on smart-phone devices using strong (and highly private) sensor signals.

• Machine Learning Production Systems: machine learning production systems are becoming more and more common in every organization. Being able to fast train and de- ploy new prediction models over time becomes essential to provide up-to-date and always improving services. Tensorflow Extended [Baylor et al., 2017] constitutes an example of such systems supporting the Google machine learning infrastructure. Recommendation

Chapter 1. Introduction 23

and anomaly prediction systems are just two examples of application which are currently benefiting from a sophisticated prediction models management system. Continual Learn- ing, in this scenario, may substantially reduce the computational burden incurred by such systems in re-training models from scratch every time (and possibly for every user) at a massive scale with a direct impact on resources occupation, energy consumption and ultimately financial resources.

2

A Comprehensive Framework for

Continual Learning

“Without the Lifelong Learning capability, AI systems will probably never be truly intel- ligent: learning machine or agent to continually learn and accumulate knowledge, and to become more and more knowledgeable and better and better at learning.”

– Bing Liu, 2014

In this chapter, we will try to define continual learning a little more formally in a comprehensive framework and with additional constraints and desiderata which will lay the formal foundations for the original proposals of the following chapters. Let us start with a simple question: what is continual learning? Drawing inspiration from the famous definition of Machine Learning by

Michalski et al. [2013], we could try to summarize continual learning, operatively, in a single sentence as in the following definition.

Definition 1. Continual Learning. A computer program is said to learn continually from experi- ence if, given a sequence of ephemeral partial experienceEi, a target functionh∗and performance

measureP, its performance in approximatingh∗ as measured by P improves with the number of processed partial experience Ei.

The focus is on the ephemeral nature of the data, which cannot be processed multiple times and the basic notion that, taken in isolation, they constitute only a partial amount of the information needed to approximate the target functionh∗, the objective of the learning process. These natural but key constraints, as we have argued in the previous chapter, leads to profound theoretical and practical implications worth considering in the development of truly intelligent artificial systems.

Chapter 2. A Formal Framework for CL 25

2.1

Formal Definition

Early theoretical attempts to formalize the continual learning paradigm can be found in Ring

[2005]. More general framework proposals include [Pentina and Lampert,2015]. As in [Pentina and Lampert,2015], we assume CL is tackling a PAC learnable problem in the approximation of a target hypothesish∗but learning from a sequence of non i.i.d. training batches. Our framework could also be seen as a generalization of the setting proposed in [Lopez-paz and Ranzato,2017], where a“task supervised signal”t is provided along with each training example.

In both settings, if we were capable to observe all data streamed throughout a lifetime, the dis- tribution we would like to model would be just one and we could consider all the example being drawn from it. However, the actual reality of CL settings is that the total amount of training examples are never observed at once, but can be rather seen as drawn from asequence of distri- butions Di. In this section we expand and refine previous CL frameworks improving flexibility

and generalization but also trying to not end up with a too abstract setting. Morever, we make sure to accommodate previously proposed algorithms and more recent ones with a number of constraints and relative relaxations and desiderata.

Definition 2. Continual Learning Algorithm. Given X and Y as input and output random variable respectively, let us consider D a potentially infinite sequence of unknown distributions

D={D1, . . . , Dn}overX×Y, we encounter over time (hence withn∈[2, . . . ,∞[). A continual

learning algorithmACLis an algorithm with the following signature:

∀Di∈ D, ACLi : < hi−1, Bi, Mi−1, ti>→< hi, Mi> (2.1)

Where:

• Mi is an external memory where we can store previous training examples or partial com-

putation not directly related to the parametrization of the model.

• ti is a task label, void if not provided. It can be used to disentangle tasks and specialize

the hypothesis parameters, as it is done in [Lopez-paz and Ranzato,2017].

• Biis the training batch of examples. For simplicity, these examples can be assumed to be

drawn i.i.d. from Di [Lopez-paz and Ranzato,2017;Pentina and Lampert,2013] but it is

not necessary. Indeed, this framework setting allows to accommodate continual learning approaches where examples can also be assumed to be drawn non i.i.d. from eachDi over X×Y, as in [Gepperth and Hammer,2016;Hayes et al.,2018b]. EachDican be considered

as a stationary distribution.

• Each Bi is composed of a number of examples eij with j ∈ [1, . . . ,|Bi|]. Each example ei

j =< xij, fji>, wherefiis the feedback signal and can used to infer the optimal hypothesis h∗(x, t) (i.e., exact label yji in supervised learning or any real tensor from which we can estimateh∗(x, t), such as a rewardri

Chapter 2. A Formal Framework for CL 26