Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

N. N. Schraudolph, J. Yu, and D. Aberdeen. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation. In Advances in Neural Information Processing Systems (NIPS), pp. 1185–1192, MIT Press, Cambridge, MA, 2006.

Download

pdf djvu ps.gz
336.8kB   99.4kB   1.3MB  

Abstract

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.

BibTeX Entry

@inproceedings{SchYuAbe06,
     author = {Nicol N. Schraudolph and Jin Yu and Douglas Aberdeen},
      title = {\href{http://nic.schraudolph.org/pubs/SchYuAbe06.pdf}{
               Fast Online Policy Gradient Learning
               with {SMD} Gain Vector Adaptation}},
      pages = {1185--1192},
     editor = {Yair Weiss and Bernhard Sch\"olkopf and John C. Platt},
  booktitle =  nips,
  publisher = {MIT Press},
    address = {Cambridge, MA},
     volume =  18,
       year =  2006,
   b2h_type = {Top Conferences},
  b2h_topic = {>Stochastic Meta-Descent, Reinforcement Learning},
   abstract = {
    Reinforcement learning by direct policy gradient estimation is attractive
    in theory but in practice leads to notoriously ill-behaved optimization
    problems. We improve its robustness and speed of convergence with
    stochastic meta-descent, a gain vector adaptation method that employs fast
    Hessian-vector products.  In our experiments the resulting algorithms
    outperform previously employed online stochastic, offline conjugate,
    and natural policy gradient methods.
}}

Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33