News – Rémi Gribonval

Conservation laws for neural network training: two papers at @NeurIPS23 & @ICML24

Filed under Preprint
May 24, 2024

We study conservation laws during the (euclidean or not) gradient or momentum flow of neural networks.

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows, accepted at ICML24

1/ We define the concept of conservation laws for momentum flows and show how to extend the framework from our previous paper (Abide by the Law and Follow the Flaw: Conservation Laws for Gradient Flows, oral @NeurIPS23) for non-Euclidean gradient flow (GF) and momentum flow (MF) settings. In stark contrast to the case of GF, conservation laws for MF exhibit temporal dependence.

2/ We discover new conservation laws for linear networks in the Euclidean momentum case, and these new laws are complete. In contrast, there is no conservation law for ReLU networks in the Euclidean momentum case.

3/ In a non-Euclidean context, such as in NMF or for ICNN implemented with two-layer ReLU networks, we discover new conservation laws for gradient flows and find none in the momentum case. We obtain new conservation laws in the Natural Gradient Flow case.

4/ We shed light on a quasi-systematic loss of conservation when transitioning from the GF to the MF setting.

Invited talk @IEM @EPFL, May 24th 2024

Filed under Talk
May 24, 2024

Frugality in machine learning: Sparsity, a value for the future?

Sparse vectors and sparse matrices play a transerve role in signal and image processing: they have led to succesful approaches efficiently addressing tasks as diverse as data compression, fast transforms, signal denoising and source separation, or more generally inverse problems. To what extent can the potential of sparsity be also leveraged to achieve more frugal (deep) learning techniques? Through an overview of recent explorations around this theme, I will compare and contrast classical sparse regularization for inverse problems with its natural extensions that aim at learning neural networks with sparse connections. During our journey, I will notably highlight the role of rescaling-invariances of modern deep parameterizations, which come with their curses and blessings.

Invited talk, MAP5, Paris, May 17th 2024

Filed under Talk
May 24, 2024

Frugality in machine learning: Sparsity, a value for the future?

Invited talk, Math Machine Learning seminar MPI MIS + UCLA, online, May 16, 2024

Filed under Talk
May 24, 2024

‘Conservation Laws for Gradient Flows’

Understanding the geometric properties of gradient descent dynamics is
a key ingredient in deciphering the recent success of very large
machine learning models. A striking observation is that trained
over-parameterized models retain some properties of the optimization
initialization. This “implicit bias” is believed to be responsible for
some favorable properties of the trained models and could explain
their good generalization properties. In this work, we expose the
definitions and properties of “conservation laws”, that define
quantities conserved during gradient flows of a given machine learning
model, such as a ReLU network, with any training data and any loss.
After explaining how to find the maximal number of independent
conservation laws via Lie algebra computations, we provide algorithms
to compute a family of polynomial laws, as well as to compute the
number of (not necessarily polynomial) conservation laws. We obtain
that on a number of architecture there are no more laws than the known
ones, and we identify new laws for certain flows with momentum and/or
non-Euclidean geometries.
Joint work with Sibylle Marcotte and Gabriel Peyré.