Fast Curvature Matrix-Vector Products
N. N. Schraudolph. Fast
Curvature Matrix-Vector Products. In Proc. Intl. Conf. Artificial Neural Networks
(ICANN), pp. 19–26, Springer Verlag,
Berlin, Vienna, Austria, 2001.
Latest version
Download
225.4kB | 78.3kB | 204.3kB |
Abstract
The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information. We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product (Pearlmutter, 1994). The stability of SMD, a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.
BibTeX Entry
@inproceedings{Schraudolph01, author = {Nicol N. Schraudolph}, title = {\href{http://nic.schraudolph.org/pubs/Schraudolph01.pdf}{ Fast Curvature Matrix-Vector Products}}, pages = {19--26}, editor = {Georg Dorffner and Horst Bischof and Kurt Hornik}, booktitle = icann, address = {Vienna, Austria}, volume = 2130, series = {\href{http://www.springer.de/comp/lncs/}{ Lecture Notes in Computer Science}}, publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin}, year = 2001, b2h_type = {Top Conferences}, b2h_topic = {>Stochastic Meta-Descent}, b2h_note = {<a href="b2hd-Schraudolph02.html">Latest version</a>}, abstract = { The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information. We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product (Pearlmutter, 1994). The stability of SMD, a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian. }}