Generalization Error of Linearized Neural Networks: Staircase and Double-Descent


Song Mei,Stanford University


2020.06.11 10:30-11:30




Zoom ID: 964-201-02658
Password: 738669

If you cannot log in the above Zoom ID, please use the following one instead:
Zoom ID: 266-664-3379
Password: 738669


Deep learning methods operate in regimes that defy the traditional statistical mindset. Neural network architectures often contain more parameters than training samples, and are so rich that they can interpolate the observed labels. Despite their huge complexity, the same architectures achieve small generalization error on test data.

As one possible explanation to the training efficiency of neural networks, tangent kernel theory argues that multi-layers neural networks — in a proper large width limit — can be well approximated by their linearization. As a consequence, the gradient flow on the empirical risk turns into a linear dynamics and converges to a global minimizer of the training loss. Starting from last year, such linearization has become a popular approach in analyzing training dynamics of neural networks. This naturally raises the question of whether the linearization perspective can also explain the generalization efficacy of neural networks.

In this talk, I will discuss the generalization error of linearized neural networks, which reveals two interesting phenomena: the staircase phenomenon and the double-descent phenomenon. Through the lens of these phenomena, I will address the benefits and limitations of the linearization approach for neural networks.


Song Mei is a Ph.D. student major in Applied Mathematics and Statistics at Stanford University. His research is motivated by data science, and lies at the intersection of statistics, machine learning, information theory, and computer science. His work often builds on insights that originated within the statistical physics literature. His recent research interests include theory of deep learning, high dimensional geometry, approximate Bayesian inferences, and applied random matrix theory.