Logo

Statistical Efficiency and Optimization of Deep Learning from the View Point of Non-Convexity

Be8f47446f910f9a3d733dac59e694341739e0b8

Speaker

Taiji Suzuki, The University of Tokyo, Japan

Time

2020.11.12 10:00-11:00

Venue

Online—ZOOM APP

ZOOM Info

Zoom Link: 978-520-64342

Password: 712108

Abstract

In this talk, I discuss how deep learning can statistically outperform shallow methods such as kernel methods utilizing the notion of sparsity of a target function space, and present a non-convex optimization framework with a generalization and excess risk bounds. In the first half, I will summarize our recent work on the excess risk bounds of deep learning in the Besov space and its variants. It will be shown that the superiority of deep learning stems from sparsity of the target function space, and more essentially non-convex geometry of the space characterizes this property. In such a situation, deep learning can achieve the so-called adaptive estimation which gives a better excess risk than shallow methods. In the latter half, I present a deep learning optimization framework based on a noisy gradient descent in infinite dimensional Hilbert space (gradient Langevin dynamics), and show generalization error and excess risk bounds for the solution obtained by the optimization procedure. The proposed framework can deal with finite and infinite width networks simultaneously unlike existing one such as neural tangent kernel and mean field analysis.

Bio

Taiji Suzuki is an Associate Professor in the Department of Mathematical Informatics at the University of Tokyo and the Center for Advanced Intelligence Project in RIKEN, Tokyo. Professor Suzuki obtained his Bachelor master and PhD degrees all from the University of Tokyo. His research is focused on theoretical understanding of deep learning, kernel methods and nonparametric statistical methods, optimization of deep learning, and information geometry.

Sponsors

Video