Zoom Link: 978-520-64342
In this talk, I discuss how deep learning can statistically outperform shallow methods such as kernel methods utilizing the notion of sparsity of a target function space, and present a non-convex optimization framework with a generalization and excess risk bounds. In the first half, I will summarize our recent work on the excess risk bounds of deep learning in the Besov space and its variants. It will be shown that the superiority of deep learning stems from sparsity of the target function space, and more essentially non-convex geometry of the space characterizes this property. In such a situation, deep learning can achieve the so-called adaptive estimation which gives a better excess risk than shallow methods. In the latter half, I present a deep learning optimization framework based on a noisy gradient descent in infinite dimensional Hilbert space (gradient Langevin dynamics), and show generalization error and excess risk bounds for the solution obtained by the optimization procedure. The proposed framework can deal with finite and infinite width networks simultaneously unlike existing one such as neural tangent kernel and mean field analysis.
Taiji Suzuki is an Associate Professor in the Department of Mathematical Informatics at the University of Tokyo and the Center for Advanced Intelligence Project in RIKEN, Tokyo. Professor Suzuki obtained his Bachelor master and PhD degrees all from the University of Tokyo. His research is focused on theoretical understanding of deep learning, kernel methods and nonparametric statistical methods, optimization of deep learning, and information geometry.