Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
N. N. Schraudolph, J. Yu, and D. Aberdeen. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation. In Advances in Neural Information Processing Systems (NIPS), pp. 1185–1192, MIT Press, Cambridge, MA, 2006.
Download
336.8kB | 99.4kB | 1.3MB |
Abstract
Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.
BibTeX Entry
@inproceedings{SchYuAbe06, author = {Nicol N. Schraudolph and Jin Yu and Douglas Aberdeen}, title = {\href{http://nic.schraudolph.org/pubs/SchYuAbe06.pdf}{ Fast Online Policy Gradient Learning with {SMD} Gain Vector Adaptation}}, pages = {1185--1192}, editor = {Yair Weiss and Bernhard Sch\"olkopf and John C. Platt}, booktitle = nips, publisher = {MIT Press}, address = {Cambridge, MA}, volume = 18, year = 2006, b2h_type = {Top Conferences}, b2h_topic = {>Stochastic Meta-Descent, Reinforcement Learning}, abstract = { Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods. }}