Zoom ID: 927-046-96260
If you cannot log in the above Zoom ID, please use the following one instead:
Zoom ID: 266-664-3379
Modern deep learning has popularized the use of very large neural networks, but the theoretical tools to study such networks are still lacking. The Neural Tangent Kernel (NTK) describes how the output neutrons evolve during training. In the infinite width limit (when the number of hidden neutrons grows to infinity) the NTK converges to a deterministic and fixed limit, leading to a simple description of the dynamics of infinitely wide DNNs. The NTK is affected by the architecture of the network, and as such helps understanding how architecture choices affect the convergence and generalization of DNNs.
As the depth of the network grows two regimes appear. A Freeze regime where the NTK is almost constant and convergence is slow and a Chaotic regime, where the NTK approaches a Kronecker delta, which speeds up training but may hurt generalization. Increasing the variance of the bias at initialization pushes the network towards the Freeze regime, while normalization methods such as Layer- and Batch-Normalization push the networks towards the Chaotic regime.
In GANs the Freeze regime leads to Mode Collapse, where the generator converge to a constant, and to checkerboard patterns, i.e. repeating patterns in images. Both problems are greatly reduced when the generator is chaotic, which may explain the importance of Batch Normalization in the training of GANs.
Arthur Jacot is a PhD student in mathematics at the EPFL. After a Bachelor at the Freie Universität Berlin, he finished his master at the EPFL. Since 2018, he works on the theory of deep neural networks at the Chair of Statistical Field Theory, supervised by Prof. Clément Hongler. He specializes in the study of infinitely wide networks, i.e. in the limit where the number of hidden neutrons grows to infinity. In this limit, the dynamics simplify and are described by a single object, the Neural Tangent Kernel, which was introduced by Arthur Jacot, Franck Gabriel and Clément Hongler in a 2018 NeurIPS paper.