Logo

A Statistical Mechanics Theory of Generalization in Kernel Regression and Wide Neural Networks

Speaker

Cengiz Pehlevan, Harvard University

Time

2020.06.25 10:00-11:00

Venue

Online—ZOOM APP

ZOOM Info

Zoom ID: 985-347-52778
Password: 738669

If you cannot log in the above Zoom ID, please use the following one instead:
Zoom ID: 266-664-3379
Password: 738669

Abstract

We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset.

Joint work with Blake Bordelon and Abdulkadir Canatar

Bio

Cengiz (pronounced “Jen·ghiz”) Pehlevan is an assistant professor of applied mathematics at the Harvard John A. Paulson School of Engineering and Applied Sciences. He received his undergraduate degrees in physics and electrical engineering from Bogazici University of Istanbul in 2004 and his doctorate in theoretical physics from Brown University in 2011. He was a Swartz Fellow at Harvard University, a postdoctoral associate at Janelia Research Campus, and a research scientist in the neuroscience group at the Flatiron Institute. His research interests are in theoretical neuroscience, theory of deep learning, biologically-inspired machine learning and neuromorphic computing.