• No results found

Average Temporal Difference Rewards for Puddleworld

Temporal uncertainty during overshadowing: A temporal difference account

Temporal uncertainty during overshadowing: A temporal difference account

... 51 Temporal Uncertainty during Overshadowing absence, then conditioned responding to the CS ...their average duration, and hence the average reinforcement rate during fixed and variable CS ...

13

Temporal Difference Learning in the Tetris Game

Temporal Difference Learning in the Tetris Game

... the average number of lines removed per game, and N E is the number of times a sub-optimal move is chosen for exploration purposes divided by the overall number of ...

6

True Online Temporal-Difference Learning

True Online Temporal-Difference Learning

... the average return on the first N episodes of two methods is only fair if they have seen roughly the same amount of samples in those episodes, which is not guaranteed for this ...the average score per ...

40

On quantiles of the temporal aggregation of a stable moving average process and their

On quantiles of the temporal aggregation of a stable moving average process and their

... quantile difference of the daily return was always done using an aggregation level of 720, regardless of the number of intraday log returns over 30 second intervals that were reported on that ...

15

On Generalized Bellman Equations and Temporal-Difference Learning

On Generalized Bellman Equations and Temporal-Difference Learning

... total rewards received when starting from those states, whereas for the second set of states whose λ = 0, we only use the information about their one-stage rewards and how these states relate to the ...

49

The application of temporal difference learning in optimal diet models

The application of temporal difference learning in optimal diet models

... of Temporal Difference learning motivated by growing evidence for neural corre- lates in natural reinforcement ...conflicting rewards which is conditionally suboptimal in a fixed environment but ...

18

A Complementary Learning Systems approach to Temporal Difference Learning

A Complementary Learning Systems approach to Temporal Difference Learning

... key difference is that a standard DQN only uses the DNN for calculation of Q values whereas CTDL also incorporates the predictions of a ...future rewards was set to ...

14

Temporal difference Learning with Sampling Baseline for Image Captioning

Temporal difference Learning with Sampling Baseline for Image Captioning

... 2015) used the REINFORCE algorithm (Williams 1992) and proposed a novel training method at sequence level direct- ly optimizing the non-differentiable test metric. (Liu et al. 2016) applied the policy gradient algorithm ...

8

Investigating learning rates for evolution and temporal difference learning

Investigating learning rates for evolution and temporal difference learning

... One of the simple aspects of the Treasure Hunt Game is that when a counter is placed, it remains unchanged until the end of the game. This suggests that a reinforcement learning algorithm that performed only terminal ...

8

A Complementary Learning Systems approach to Temporal Difference Learning

A Complementary Learning Systems approach to Temporal Difference Learning

... key difference is that a standard DQN only uses the DNN for calculation of Q values whereas CTDL also incorporates the predictions of a ...future rewards was set to ...

14

Neural mechanisms of individual differences in temporal discounting of monetary and primary rewards in adolescents

Neural mechanisms of individual differences in temporal discounting of monetary and primary rewards in adolescents

... ’ average impatience, and to decompose their TD choices into amount sensitivity and delay ...with average impatience, while reward valuation areas were uniquely implicated in amount sensitivity, and ...

15

Enhanced Neural Responses to Imagined Primary Rewards Predict Reduced Monetary Temporal Discounting

Enhanced Neural Responses to Imagined Primary Rewards Predict Reduced Monetary Temporal Discounting

... ture rewards? Although we cannot definitively establish that vmPFC activity during juice imagination in our task is a repre- sentation of the imagined pleasantness of the juice, such a role for vmPFC would be ...

7

Enhanced neural responses to imagined primary rewards predict reduced monetary temporal discounting

Enhanced neural responses to imagined primary rewards predict reduced monetary temporal discounting

... ture rewards? Although we cannot definitively establish that vmPFC activity during juice imagination in our task is a repre- sentation of the imagined pleasantness of the juice, such a role for vmPFC would be ...

8

Transfer Learning via Inter-Task Mappings for Temporal Difference Learning

Transfer Learning via Inter-Task Mappings for Temporal Difference Learning

... Another related approach (Guestrin et al., 2003) uses linear programming to determine value functions for classes of similar agents. Rather than treating the different agents independently, all agents in the same class ...

43

Self Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning

Self Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning

... On the other hand, we argue that looking ahead is not very necessary due to the stochastic element. Since the evaluation function is determined by dice, the evaluation function will become smoother since a position’s ...

12

Reinforcement learning with temporal logic rewards

Reinforcement learning with temporal logic rewards

... for the chosen language. One obvious choice is Signal Temporal Logic (STL), which is defined over infinite real- valued signals with a time bound required for every temporal operator. While this is useful ...

9

Proximal Gradient Temporal Difference Learning Algorithms

Proximal Gradient Temporal Difference Learning Algorithms

... Designing a true stochastic gradient unconditionally stable temporal difference (TD) method with finite-sample conver- gence analysis has been a longstanding goal of reinforce- ment lear[r] ...

5

Temporal Difference Learning of Position Evaluation in the Game of Go

Temporal Difference Learning of Position Evaluation in the Game of Go

... However, there is nothing to keep us from training the network on moves that are not based on its own predictions — for instance, it can learn by playing against a conventional Go progra[r] ...

8

Analysis of Temporal Polarization Phase Difference for Major Crops in India

Analysis of Temporal Polarization Phase Difference for Major Crops in India

... The distribution of CPD in C-band polarimetric SAR data corresponding to major kharif and rabi crops and other land cover features have been studied over Central State Farm, Hisar, Haryana. The probability density ...

11

-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

... Broader Impact -models provide a way to study infinite-horizon prediction in the same way that we can study the problem of control in infinite-horizon MDPs, possibly providing a path for more accurate long-term modeling ...

12

Show all 10000 documents...

Related subjects