‘Conservation Laws for Gradient Flows’
Understanding the geometric properties of gradient descent dynamics is
a key ingredient in deciphering the recent success of very large
machine learning models. A striking observation is that trained
over-parameterized models retain some properties of the optimization
initialization. This “implicit bias” is believed to be responsible for
some favorable properties of the trained models and could explain
their good generalization properties. In this work, we expose the
definitions and properties of “conservation laws”, that define
quantities conserved during gradient flows of a given machine learning
model, such as a ReLU network, with any training data and any loss.
After explaining how to find the maximal number of independent
conservation laws via Lie algebra computations, we provide algorithms
to compute a family of polynomial laws, as well as to compute the
number of (not necessarily polynomial) conservation laws. We obtain
that on a number of architecture there are no more laws than the known
ones, and we identify new laws for certain flows with momentum and/or
non-Euclidean geometries.
Joint work with Sibylle Marcotte and Gabriel Peyré.