Zoom ID: 610-151-76487
Recent theoretical works have characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic mean-field limit when the number of parameters tends towards infinity. At initialization, the random sampling of the parameters leads to a fluctuation from the mean-field limit dictated by the classical Central Limit Theorem (CLT). However, as gradient descent induces correlations among the parameters, the question of how the deviation evolves during training remains unclear. Here, we prove that the deviation from the mean-field limit scaled by the width, in the width-asymptotic limit, remains bounded throughout training. The upper bound is given by a Monte-Carlo type resampling error, which does not depend explicitly on the dimension. It then motivates the use of the 2-norm of the underlying measure as a regularization term, which controls the generalization as well via the variation-norm function spaces. Moreover, if the mean-field dynamics converges to a measure that interpolates the training data, we prove that the asymptotic deviation eventually vanishes in the CLT scaling. We also complement these results with numerical experiments.