Fast Curvature Matrix-Vector Products

N. N. Schraudolph. Fast Curvature Matrix-Vector Products. In Proc. Intl. Conf. Artificial Neural Networks (ICANN), pp. 19–26, Springer Verlag, Berlin, Vienna, Austria, 2001.

 225.4kB 78.3kB 204.3kB

Abstract

The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information. We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product (Pearlmutter, 1994). The stability of SMD, a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.

BibTeX Entry

```@inproceedings{Schraudolph01,
author = {Nicol N. Schraudolph},
title = {\href{http://nic.schraudolph.org/pubs/Schraudolph01.pdf}{
Fast Curvature Matrix-Vector Products}},
pages = {19--26},
editor = {Georg Dorffner and Horst Bischof and Kurt Hornik},
booktitle =  icann,
volume =  2130,
series = {\href{http://www.springer.de/comp/lncs/}{
Lecture Notes in Computer Science}},
publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
year =  2001,
b2h_type = {Top Conferences},
b2h_topic = {>Stochastic Meta-Descent},
abstract = {
The Gauss-Newton approximation of the Hessian guarantees positive
semi-definiteness while retaining more second-order information than
the Fisher information.  We extend it from nonlinear least squares to
all differentiable objectives such that positive semi-definiteness
is maintained for the standard loss functions in neural network
regression and classification.  We give efficient algorithms for
computing the product of extended Gauss-Newton and Fisher information
matrices with arbitrary vectors, using techniques similar to but even
cheaper than the fast Hessian-vector product (Pearlmutter, 1994).
The stability of SMD, a learning rate adaptation method that uses
curvature matrix-vector products, improves when the extended
Gauss-Newton matrix is substituted for the Hessian.
}}
```

Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33