Young Researcher Workshop on Uncertainty Quantification and Machine Learning

Frequency Principle in Deep Neural Networks

Speaker

Zhiqin Xu , New York University Abu Dhabi

Time

06 Jun, 15:00 - 15:30

Abstract

It remains a puzzle that why deep neural networks, with more parameters than that of samples, can often generalize well. An attempt to understand this puzzle is to discover implicit bias in the training process. However, without an explicit mathematical description, it is unclear how implicit bias functions in the training process. In this work, we first show the universality of the F-Principle— DNNs initialized with small parameters often fit target functions from low to high frequencies — by demonstrating this phenomenon on high-dimensional benchmark datasets, such as MNIST/CIFAR10. We also give a mathematical proof of the F-Principle. Then, we consider a neural network with an extremely large width. In such a regime, the F-Principle is found to be equivalent to an explicitly regularized optimization problem. With the equivalent explicit regularity, we then estimate a prior generalization error bound and show that a non-zero initial output can damage the generalization ability. We further propose an initialization trick to eliminate the negative impact of initialization, even in a mildly over-parameterized regime. Our work shows the F-Principle can lead a neural network trained without an explicit regularity to good generalization performance.