Fast Curvature Matrix-Vector Products

N. N. Schraudolph. Fast Curvature Matrix-Vector Products. In Proc. Intl. Conf. Artificial Neural Networks (ICANN), pp. 19–26, Springer Verlag, Berlin, Vienna, Austria, 2001.
Latest version

Download


225.4kB	78.3kB	204.3kB

Abstract

The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information. We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product (Pearlmutter, 1994). The stability of SMD, a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.

BibTeX Entry

@inproceedings{Schraudolph01,
     author = {Nicol N. Schraudolph},
      title = {\href{http://nic.schraudolph.org/pubs/Schraudolph01.pdf}{
               Fast Curvature Matrix-Vector Products}},
      pages = {19--26},
     editor = {Georg Dorffner and Horst Bischof and Kurt Hornik},
  booktitle =  icann,
    address = {Vienna, Austria},
     volume =  2130,
     series = {\href{http://www.springer.de/comp/lncs/}{
               Lecture Notes in Computer Science}},
  publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
       year =  2001,
   b2h_type = {Top Conferences},
  b2h_topic = {>Stochastic Meta-Descent},
   b2h_note = {<a href="b2hd-Schraudolph02.html">Latest version</a>},
   abstract = {
    The Gauss-Newton approximation of the Hessian guarantees positive
    semi-definiteness while retaining more second-order information than
    the Fisher information.  We extend it from nonlinear least squares to
    all differentiable objectives such that positive semi-definiteness
    is maintained for the standard loss functions in neural network
    regression and classification.  We give efficient algorithms for
    computing the product of extended Gauss-Newton and Fisher information
    matrices with arbitrary vectors, using techniques similar to but even
    cheaper than the fast Hessian-vector product (Pearlmutter, 1994).
    The stability of SMD, a learning rate adaptation method that uses
    curvature matrix-vector products, improves when the extended
    Gauss-Newton matrix is substituted for the Hessian.
}}