[PDF] Top 20 Policy Gradient Methods: Variance Reduction and Stochastic Convergence

Policy Gradient Methods: Variance Reduction and Stochastic Convergence

... estimation variance of policy gradient algorithms, in particular, when augmenting the estimate with a baseline, a common method for reducing estimation variance, and when using actor-critic ... See full document

224

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

... Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, ...the ... See full document

60

Adaptive Proximal Average Based Variance Reducing Stochastic Methods for Optimization with Composite Regularization

... is stochastic gradient descent (SGD) (Robbins and Monro 1951) which involves lower per iteration cost by utilizing the stochastic gradient instead of the full gradient to update ... See full document

8

Stochastic Variance-Reduced Cubic Regularization Methods

... that gradient methods with additive noise are also able to find approximate local minima faster than the first-order ...the gradient complexity by exploring the third-order smoothness of objective ... See full document

47

Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization

... the stochastic optimization problem (1) is the discrete stochastic model, where the random variable S is discrete and thus, usually the objective function is given as a finite sum of functional ...Linear ... See full document

42

Utilizing Second Order Information in Minibatch Stochastic Variance Reduced Proximal Iterations

... the convergence of optimization algo- ...Newton methods not well suited in solving ...ear convergence rates for different variants of sub-sampled Newton methods, when the size of the ... See full document

56

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

... the gradient ∂ α V of V with re- spect to ...in policy gradient ...multidimensional) policy parameter α. In order to apply gradient methods to search for a local maximum of the ... See full document

15

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

... The convergence rate of SGD can be further improved with the so-called variance- reduction technique, first proposed by Schmidt et ...the gradient estimator ∇ e k so that its variance ... See full document

51

Efficient Methods For Large-Scale Empirical Risk Minimization

... quasi-Newton methods that lead to the former’s superlinear convergence ...the variance-reduced stochastic quasi-Newton methods in [58, 78] that attempt to reduce only the noise of gra- ... See full document

321

Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization

... order stochastic algorithm, where T is the iteration ...proximal stochastic methods with the variance reduction techniques of SVRG and SAGA, which are denoted as ZO-ProxSVRG and ... See full document

8

DSA: Decentralized Double Stochastic Averaging Gradient Algorithm

... double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on: (i) The use of local stochastic averaging ...consecutive stochastic averaging ... See full document

35

Multi Kernel Learning with Online-Batch Optimization

... Another important speed-up can be obtained considering the nature of the updates of the second stage. If the optimal solution has a loss equal to zero or close to it, when the algorithm is close to convergence ... See full document

27

Gradient and relaxation nonlinear techniques for the analysis of cable supported structures

... Chapter 9 performs a general comparative study of the convergence characteristics of the best methods from each classification {gradient, relaxation and stiffness methods}, and examines [r] ... See full document

358

On the Convergence of Maximum Variance Unfolding

... recover an isometry when the manifold is isometric to a convex domain in some lower-dimensional Euclidean space. To justify HLLE, Donoho and Grimes (2003) show that the null space of the (continuous) Hessian operator ... See full document

24

Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

... quasi-Newton methods (e.g., Andrew and Gao, 2007), or accelerated first-order methods (Nesterov, 2007; Tseng, 2008; Beck and Teboulle, ...first-order methods, evaluating one single gradient of ... See full document

54

On Perturbed Proximal Gradient Algorithms

... solutions with slightly more stable sparsity structure than Solver 1 (less variance on the red curves). Whether such subtle differences exist between the two algorithms (a diminishing step-size and fixed Monte ... See full document

33

Variance of Time to Recruitment for a Single Grade Manpower System with Different Epochs for Exits and Two Types of Decisions Having Two Thresholds

... The models discussed in this paper are new in the context of considering (i) separate points (exit points) on the time axis for attrition, thereby removing a severe limitation on instantaneous attrition at decision ... See full document

6

Analysis of the Sign Regressor Least Mean Fourth Adaptive Algorithm

... The paper is organized as follows: following the Intro- duction is Section 2 where the proposed algorithm is developed, while the mean-square analysis of the proposed SRLMF algorithm is presented in Section 3. The ... See full document

12

Certain Systems Arising In Stochastic Gradient Descent

... Stochastic approximations arise naturally in many different contexts. Some early results were published by [Rup88] and [PJ92]. There, they dealt with averaged stochastic gradient descent (ASGD) ... See full document

105

Two spectral gradient projection methods for constrained equations and their linear convergence rate

... projection methods in [, ] and the spectral gradient method in [], we propose two spectral gradient projection methods for solving nonsmooth constrained equations, which can be viewed ... See full document

13