reinforcement learning

Top PDF reinforcement learning:

What is Acceptably Safe for Reinforcement Learning?

What is Acceptably Safe for Reinforcement Learning?

Abstract. Machine Learning algorithms are becoming more prevalent in critical systems where dynamic decision making and efficiency are the goal. As is the case for complex and safety-critical systems, where cer- tain failures can lead to harm, we must proactively consider the safety assurance of such systems that use Machine Learning. In this paper we explore the implications of the use of Reinforcement Learning in particu- lar, considering the potential benefits that it could bring to safety-critical systems, and our ability to provide assurances on the safety of systems incorporating such technology. We propose a high-level argument that could be used as the basis of a safety case for Reinforcement Learning systems, where the selection of ‘reward’ and ‘cost’ mechanisms would have a critical effect on the outcome of decisions made. We conclude with fundamental challenges that will need to be addressed to give the confidence necessary for deploying Reinforcement Learning within safety- critical applications.
Show more

14 Read more

Exploring Deep Reinforcement Learning with Multi Q Learning

Exploring Deep Reinforcement Learning with Multi Q Learning

Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but it is often beneficial to use a func- tion-approximation system, such as deep neural networks, to estimate state values. It has been previously observed that Q-learning can be unstable when using value func- tion approximation or when operating in a stochastic environment. This instability can adversely affect the algorithm’s ability to maximize its returns. In this paper, we present a new algorithm called Multi Q-learning to attempt to overcome the instabil- ity seen in Q-learning. We test our algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks. Our results show that in most cases, Multi Q-learning outperforms Q- learning, achieving average returns up to 2.5 times higher than Q-learning and hav- ing a standard deviation of state values as low as 0.58.
Show more

16 Read more

Deep Reinforcement Learning for Swarm Systems

Deep Reinforcement Learning for Swarm Systems

In this paper, we proposed the use of mean feature embeddings as state representations to overcome two major problems in deep reinforcement learning for swarms: the high and possibly changing dimensionality of information perceived by each agent. We introduced three different approaches to realize such embeddings — two manually designed approaches based on histograms / radial basis functions and an end-to-end learned neural network fea- ture representation. We evaluated the approaches on different variations of the rendezvous and pursuit evasion problem and compared their performances to that of a naive feature concatenation method and classical approaches found in the literature. Our evaluation re- vealed that learning embeddings end-to-end using neural network features scales well with increasing agent numbers, leads to better performing policies, and often results in faster con- vergence compared to all other approaches. As expected, the naive concatenation approach fails for larger system sizes.
Show more

31 Read more

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

In this work, we propose taking a data-driven approach to train a model that can conduct evalu- ation in learning for paraphrasing generation. The framework contains two modules, a generator (for paraphrase generation) and an evaluator (for para- phrase evaluation). The generator is a Seq2Seq learning model with attention and copy mecha- nism (Bahdanau et al., 2015; See et al., 2017), which is first trained with cross entropy loss and then fine-tuned by using policy gradient with su- pervisions from the evaluator as rewards. The evaluator is a deep matching model, specifically a decomposable attention model (Parikh et al., 2016), which can be trained by supervised learn- ing (SL) when both positive and negative exam- ples are available as training data, or by inverse reinforcement learning (IRL) with outputs from the generator as supervisions when only positive examples are available. In the latter setting, for the training of evaluator using IRL, we develop a novel algorithm based on max-margin IRL prin- ciple (Ratliff et al., 2006). Moreover, the gener- ator can be further trained with non-parallel data, which is particularly effective when the amount of parallel data is small.
Show more

14 Read more

Applications of deep learning and reinforcement learning to biological data

Applications of deep learning and reinforcement learning to biological data

Rapid advances of hardware-based technologies during the past decades have opened up new possibilities for Life scientists to gather multimodal data in various application domains (e.g., Omics, Bioimaging, Medical Imaging, and [Brain/Body]-Machine Interfaces), thus generating novel opportunities for development of dedicated data intensive machine learning techniques. Overall, recent research in Deep learning (DL), Reinforcement learning (RL), and their combination (Deep RL) promise to revolutionize Artificial Intelligence. The growth in computational power accompanied by faster and increased data storage and declining computing costs have already allowed scientists in various fields to apply these techniques on datasets that were previously intractable for their size and complexity. This review article provides a comprehensive survey on the application of DL, RL, and Deep RL techniques in mining Biological data. In addition, we compare performances of DL techniques when applied to different datasets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
Show more

33 Read more

Sentence Simplification with Deep Reinforcement Learning

Sentence Simplification with Deep Reinforcement Learning

In this paper we propose a simplification model which draws on insights from neural machine translation (Bahdanau et al., 2015; Sutskever et al., 2014). Central to this approach is an encoder- decoder architecture implemented by recurrent neural networks. The encoder reads the source sequence into a list of continuous-space repre- sentations from which the decoder generates the target sequence. Although our model uses the encoder-decoder architecture as its backbone, it must also meet constraints imposed by the sim- plification task itself, i.e., the predicted output must be simpler, preserve the meaning of the in- put, and grammatical. To incorporate this knowl- edge, the model is trained in a reinforcement learning framework (Williams, 1992): it explores the space of possible simplifications while learn- ing to maximize an expected reward function that encourages outputs which meet simplification- specific constraints. Reinforcement learning has been previously applied to extractive summariza- tion (Ryang and Abekawa, 2012), information ex- traction (Narasimhan et al., 2016), dialogue gen- eration (Li et al., 2016), machine translation, and image caption generation (Ranzato et al., 2016).
Show more

11 Read more

Practical Kernel-Based Reinforcement Learning

Practical Kernel-Based Reinforcement Learning

The basic idea of reproducing-kernel methods is to apply the “kernel trick” in the context of reinforcement learning (Sch¨ olkopf and Smola, 2002). Roughly speaking, the approxima- tion problem is rewritten in terms of inner products only, which are then replaced by a properly-defined kernel. This modification corresponds to mapping the problem to a high- dimensional feature space, resulting in more expressiveness of the function approximator. Perhaps the most natural way of applying the kernel trick in the context of reinforcement learning is to “kernelize” some formulation of the value-function approximation problem (Xu et al., 2005; Engel et al., 2005; Farahmand, 2011). Another alternative is to approximate the dynamics of an MDP using a kernel-based regression method (Rasmussen and Kuss, 2004; Taylor and Parr, 2009). Following a slightly different line of work, Bhat et al. (2012) propose to kernelize the linear programming formulation of dynamic programming. How- ever, this method is not directly applicable to reinforcement learning, since it is based on the assumption that one has full knowledge of the MDP. A weaker assumption is to suppose that only the reward function is known and focus on the approximation of the transition function. This is the approach taken by Grunewalder et al. (2012), who propose to em- bed the conditional distributions defining the transitions of an MDP into a Hilbert space induced by a reproducing kernel.
Show more

70 Read more

Deep Reinforcement Learning for Dialogue Generation

Deep Reinforcement Learning for Dialogue Generation

To achieve these goals, we draw on the insights of reinforcement learning, which have been widely ap- plied in MDP and POMDP dialogue systems (see Re- lated Work section for details). We introduce a neu- ral reinforcement learning (RL) generation method, which can optimize long-term rewards designed by system developers. Our model uses the encoder- decoder architecture as its backbone, and simulates conversation between two virtual agents to explore the space of possible actions while learning to maxi- mize expected reward. We define simple heuristic ap- proximations to rewards that characterize good con- versations: good conversations are forward-looking (Allwood et al., 1992) or interactive (a turn suggests a following turn), informative, and coherent. The pa- rameters of an encoder-decoder RNN define a policy over an infinite action space consisting of all possible
Show more

11 Read more

Learning to Teach in Cooperative Multiagent Reinforcement Learning

Learning to Teach in Cooperative Multiagent Reinforcement Learning

Effective diffusion of knowledge has been studied in many fields, including inverse reinforcement learning (Ng and Russell 2000), apprenticeship learning (Abbeel and Ng 2004), and learning from demonstration (Argall et al. 2009), wherein students discern and emulate key demonstrated be- haviors. Works on curriculum learning (Bengio et al. 2009) are also related, particularly automated curriculum learn- ing (Graves et al. 2017). Though Graves et al. focus on single student supervised/unsupervised learning, they high- light interesting measures of learning progress also used here. Several works meta-learn active learning policies for supervised learning (Bachman, Sordoni, and Trischler 2017; Fang, Li, and Cohn 2017; Pang, Dong, and Hospedales 2018; Fan et al. 2018). Our work also uses advising-level meta- learning, but in the regime of MARL, where agents must learn to advise teammates without destabilizing coordination. In action advising, a student executes actions suggested by a teacher, who is typically an expert always advising the opti- mal action (Torrey and Taylor 2013). These works typically use state importance value I(s, ˆ a) = max a Q(s, a)− Q(s, ˆ a)
Show more

9 Read more

Reinforcement Learning for Generative Art

Reinforcement Learning for Generative Art

To better engage with RL-based generative art, the dissertation creates RL5, a JavaScript library built on top of p5.js to improve the accessibility of reinforcement learning for creatives. RL5 allows developers to define their own RL environments in native p5.js language and train RL policies in web browsers. RL5 provides three RL algorithms to cover the four possible combinations of different types of state and action spaces, and nine RL environments to serve as building blocks for constructing complex systems. With the focus on simplicity and (re)usability, the APIs of RL5 enables users to create, train, and evaluate an RL agent in less than 20 lines of codes. The library is demonstrated in an RL environment called Avoid An Obstacle in which the goal is to train an agent to move on a 2D rectangular area from left to right without hitting a rectangle in the middle. With the same training settings but different random seeds, the agent develops different strategies to accomplish the task.
Show more

170 Read more

Collaborative reinforcement learning of autonomic behaviour

Collaborative reinforcement learning of autonomic behaviour

Collaborative Reinforcement Learning (CRL) is a bottom-up approach to tackling the complex time- varying problems of engineering autonomic behaviour for distributed systems where there is no support for global state. It is an extension to Reinforcement Learn- ing [2] (RL) for solving system-wide optimisation prob- lems in decentralised multi-agent systems. In CRL, in- dividual agents solve discrete optimisation problems us- ing RL and share solution information with their neigh- bours, contributing towards the solution of the system- wide optimisation problem. Agents are part of a dy- namic population, with support for agents joining and leaving the system and establishing connections with neighbours. CRL does not make use of system-wide knowledge and individual agents only know about and
Show more

5 Read more

Neural Logic Reinforcement Learning

Neural Logic Reinforcement Learning

Deep reinforcement learning (DRL) has achieved significant breakthroughs in various tasks. How- ever, most DRL algorithms suffer a problem of generalising the learned policy, which makes the policy performance largely affected even by mi- nor modifications of the training environment. Except that, the use of deep neural networks makes the learned policies hard to be inter- pretable. To address these two challenges, we propose a novel algorithm named Neural Logic Reinforcement Learning (NLRL) to represent the policies in reinforcement learning by first-order logic. NLRL is based on policy gradient methods and differentiable inductive logic programming that have demonstrated significant advantages in terms of interpretability and generalisability in supervised tasks. Extensive experiments con- ducted on cliff-walking and blocks manipulation tasks demonstrate that NLRL can induce inter- pretable policies achieving near-optimal perfor- mance while showing good generalisability to environments of different initial states and prob- lem sizes.
Show more

10 Read more

Evolutionary Function Approximation for Reinforcement Learning

Evolutionary Function Approximation for Reinforcement Learning

Temporal difference methods are theoretically grounded and empirically effective methods for ad- dressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods require a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper investi- gates evolutionary function approximation, a novel approach to automatically selecting function approximator representations that enable efficient individual learning. This method evolves indi- viduals that are better able to learn. We present a fully implemented instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, with Q-learning, a popular TD method. The resulting NEAT+Q algorithm automatically discovers ef- fective representations for neural network function approximators. This paper also presents on-line evolutionary computation, which improves the on-line performance of evolutionary computation by borrowing selection mechanisms used in TD methods to choose individual actions and using them in evolutionary computation to select policies for evaluation. We evaluate these contributions with extended empirical studies in two domains: 1) the mountain car task, a standard reinforcement learning benchmark on which neural network function approximators have previously performed poorly and 2) server job scheduling, a large probabilistic domain drawn from the field of autonomic computing. The results demonstrate that evolutionary function approximation can significantly im- prove the performance of TD methods and on-line evolutionary computation can significantly im- prove evolutionary methods. This paper also presents additional tests that offer insight into what factors can make neural network function approximation difficult in practice.
Show more

41 Read more

Lyapunov Design for Safe Reinforcement Learning

Lyapunov Design for Safe Reinforcement Learning

Lyapunov design methods are used widely in control engineering to design controllers that achieve qualitative objectives, such as stabilizing a system or maintaining a system’s state in a desired operating range. We propose a method for constructing safe, reliable reinforcement learning agents based on Lyapunov design principles. In our approach, an agent learns to control a system by switching among a number of given, base-level controllers. These controllers are designed using Lyapunov domain knowledge so that any switching policy is safe and enjoys basic performance guarantees. Our approach thus ensures qualitatively satisfactory agent behavior for virtually any reinforcement learning algorithm and at all times, including while the agent is learning and taking exploratory actions. We demonstrate the process of designing safe agents for four different control problems. In simulation experiments, we find that our theoretically motivated designs also enjoy a number of practical benefits, including reasonable performance initially and throughout learning, and accelerated learning.
Show more

30 Read more

Active Bayesian perception and reinforcement learning

Active Bayesian perception and reinforcement learning

Our proposal for active Bayesian perception and reinforce- ment learning is tested with a simple but illustrative task of perceiving object curvature using tapping movements of a biomimetic fingertip with unknown contact location (Fig. 1). We demonstrate first that active perception with fixation point control strategy can give robust and accurate perception, but the reaction time and acuity depend strongly on the choice of fixation point and belief threshold. Next, we introduce a reward function of the decision outcome, which for illustra- tion is taken as a linear Bayes risk of reaction time and error. Interpreting each active perception strategy (parameterized by the decision threshold and fixation point) as an action, then allows use of standard reinforcement learning methods for multi-armed bandits [20]. In consequence, the appropriate decision threshold is learnt to balance the risk of making mistakes versus the risk of reacting too slowly, while the fixation point is tuned to optimize both quantities.
Show more

7 Read more

Reinforcement Learning with Factored States and Actions

Reinforcement Learning with Factored States and Actions

One way to interpret the individual experts in the product model is that they are learning “macro” or “basis” actions. As we have seen with the Blockers task, the hidden variables come to represent sets of actions that are spatially and temporally localized. We can think of the hidden variables as representing “basis” actions that can be combined to form a wide array of possible actions. The benefit of having basis actions is that it reduces the number of possible actions, thus making exploration more efficient. The drawback is that if the set of basis actions do not span the space of all possible actions, some actions become impossible to execute. By optimizing the set of basis actions during reinforcement learning, we find a set that can form useful actions, while excluding action combinations that are either not seen or not useful.
Show more

26 Read more

Determinantal Reinforcement Learning

Determinantal Reinforcement Learning

A DPP defines a probability distribution over the subsets from a ground set. The probability of a subset is propor- tional to the determinant of a principal submatrix of a posi- tive semidefinite matrix, where the submatrix is indexed by the items in the subset. A DPP thus assigns high probability to those subsets that have relevant and diverse items. DPPs have been used in machine learning applications, includ- ing recommendation of products (Gillenwater et al. 2014; Gartrell, Paquet, and Koenigstein 2017), summarization of documents or videos (Gong et al. 2014), hyper-parameter optimization (Kathuria, Deshpande, and Kohli 2016), and mini-batch sampling (Zhang, Kjellstr¨om, and Mandt 2017). DPPs have also been used for modeling neural spiking to better represent the negative correlation between neurons (Snoek, Zemel, and Adams 2013). DPPs, however, have never been used in reinforcement learning. We will see that a DPP naturally appears with Determinantal SARSA when we choose actions according to the standard approach of Boltz- mann exploration.
Show more

8 Read more

Hierarchical Average Reward Reinforcement Learning

Hierarchical Average Reward Reinforcement Learning

Reinforcement learning (RL) is a machine learning framework for solving sequential decision- making problems. Despite its successes in a number of different domains, including backgammon (Tesauro, 1994), job-shop scheduling (Zhang and Dietterich, 1995), dynamic channel allocation (Singh and Bertsekas, 1996), elevator scheduling (Crites and Barto, 1998), and helicopter flight control (Ng et al., 2004), current RL methods do not scale well to high dimensional domains—they can be slow to converge and require many training samples to be practical for many real-world problems. This issue is known as the curse of dimensionality: the exponential growth of the number of parameters to be learned with the size of any compact encoding of system state (Bellman, 1957). Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting abstraction in RL. This leads naturally to hierarchical control architectures and associated learning algorithms.
Show more

41 Read more

Self reflective deep reinforcement learning

Self reflective deep reinforcement learning

On-line reinforcement learning agents is difficult to train. It takes long to train because the agent does not have direct answer to the input in hand, it has to rely on own assessment of how good or bad the last action was (in the long run) to achieve a goal. For real world agent the difficulty is escalated due to partial observability and variability of experience as well as unfeasibility of deliberate repetition of this experience. Experiences vary from state and action perspectives. Subsequently, even when the agent starts form the same position and takes the same action the outcome is going to vary slightly due to the continuum of possible states at any location and due to inaccuracy of actions taken. This is especially true for agents with loos mechanics which are encountered in games and DYI robots. Moreover, it is physically difficult or undesirable to let the agent run through many episodes. Therefore, the agent needs to maximize the advantage of available experience with the least amount of time and repetition; hence offline reflection on past experience can play an important role to mitigate these difficulties.
Show more

7 Read more

Construction of Approximation Spaces for Reinforcement Learning

Construction of Approximation Spaces for Reinforcement Learning

Reinforcement learning (RL, Sutton and Barto, 1998; Bertsekas and Tsitsiklis, 1996) provides a framework to autonomously learn control policies in stochastic environments and has become pop- ular in recent years for controlling robots (e.g., Abbeel et al., 2007; Kober and Peters, 2009). The goal of RL is to compute a policy which selects actions that maximize the expected future reward (called value). An agent has to make these decisions based on the state x ∈ X of the system. The state space X may be finite or continuous, but is in many practical cases too large to be represented directly. Approximated RL addresses this by choosing a function from function set F that resembles the true value function. Many function sets F have been proposed (see, e.g., Sutton and Barto, 1998; Kaelbling et al., 1996, for an overview). This article will focus on the space of linear functions with p non-linear basis functions { φ i ( · ) } i=1 p (Bertsekas, 2007), which we call approximation space F φ .
Show more

52 Read more

Show all 10000 documents...