Google scholar

Semantic scholar

Publication list by projects

Machine Learning Thoery

Frequency Principle: We found that DNNs often fit target functions from low to high frequencies. Based on the F-Principle, we design a Multi-scale DNN for sloving “curse of high frequency”.

Loss landscape: We prove an Embedding Principle that the loss landscape of a DNN “contains” all the critical points of all the narrower DNNs.

Initialization: We study how initialization affects the dynamics of DNNs, such as the generalization error induced by initialization in linear regime and phase diagram analysis. We found a weight condensation regime.

Condensation: we study the mechanism of condensation.

Dropout: We study the implicit bias of Dropout.

Reinforcement Learning: We compare GD and temporal difference in training DNNs in RL.

AI for Science

Multi-scale DNN: Based on Frequency Principle, we design a Multi-scale DNN to solve the “curse of high frequency”.

DNN for PDE: We develop DNN-based algorithms for solving PDEs, including solving a single PDE or learning a PDE operator.

Combustion: We develop DNN-based algorithms for model reduction for simplifying chemical kinetics and accelerating the simulation of chemical ODEs.

Others

Computer Vision

Computational Neuroscience

Optimization

All papers

2021年4月17日在机器学习联合研讨计划的报告PPT,报告见B站

A PPT and a summary for F-Principle are also provided.

科普

Embedding Principle

调参,注意神经网络处于哪种相态 文献[10]

从频率角度理解为什么深度可以加速神经网络的训练 文献[11]

线性Frequency Principle动力学:定量理解深度学习的一种有效模型 文献[13,17]

F-Principle:初探深度学习在计算数学的应用 文献[4,9,12]

F-Principle:初探理解深度学习不能做什么 [4]

从傅里叶分析角度解读深度学习的泛化能力 [1,2,4,17]

多尺度神经网络解微分方程 文献[9,12]

code

1d F-Principle code at github.

Useful technique:A note of using Tensorflow to code Laplacian operator in high dimension

1d example of F-Principle

alt text 

Fourier Domain

F-Principle: DNNs often fit target functions from low to high frequencies.

Each frame is several training steps.

Red: FFT of the target function;

Blue: FFT of DNN output.

Abscissa: frequency;

Ordinate: amplitude.

Paper list

* indicates the corresponding author

#: Equal contribution

bib citation format is Here.

Deep learning

Reading guidance:

Frequency Principle: An overview is in [27]. [4] is a comprehensive study of F-Principle with low- and high-dimensional experiments and a simple thoery. The first paper of F-Principle is [1]. Theory for the F-Principle of general networks with infinite samples is in [7], of inifite width two-layer network (NTK regime) with finite sample is in [13] and [17] (the initial version of [13] is [6].) We further show a Fourier-domain Variational Formulation for supervised learning inspired by the linear frequency principle and prove its well-posedness in 14. We also use F-Principle to understand why DNN and traditional numerical methods have different solution when overparameterized in [15].

Embedding Principle: [19, 23] prove an embedding principle that the loss landscape of a DNN “contains” all the critical points of all the narrower DNNs. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. A motivation of study embedding is the condensation observed in [10] and studied in [18].

Phase diagram: In [10], we draw a phase diagram for two-layer ReLU neural networks at the infinite-width limit for a complete characterization of its dynamical regimes and their dependence on hyperparameters related to initialization. The condensation at initial stage is explained in [18].

AI for science: In [25], we apply DNN for reducing the detailed mechanism of chemical kinetics. in [26], we use DNN to accelerate the simulation of chemical kinetics.

Solve PDE: (a) The MscaleDNN is proposed and comprehensively studied in [9] (An initial verion is in [8]). [12,24] further develops MscaleDNN. The original idea of DNN slowly solving high frequency in PDE is in Fig. 4 of [4]. (b) MOD-Net (Model-operator-data network) learns the PDE operator by DNN with cheap data as regularization.

NTK: In NTK regime, we also show a specific generalization error induced by initialization in [5], in which we also propose an ASI to improve the generalization. F-Principle is also stuided in NTK in [7].

Multi-layer: In [11], to understand why feedforward deeper learning is faster, we propose a deep frequency principle, that is, the effective target function for a deeper hidden layer biases towards lower frequency during the training.

[32] Zhemin Li, Zhi-Qin John Xu, Tao Luo, Hongxia Wang*, A regularized deep matrix factorized model of matrix completion for image restoration, IET Image Processing (2022).web, and pdf, and in arxiv

[31] Shuyu Yin, Tao Luo, Peilin Liu, Zhi-Qin John Xu*, An Experimental Comparison Between Temporal Difference and Residual Gradient with Neural Network Approximation. arxiv 2205.12770 (2022) pdf, and in arxiv.

[30] Zhiwei Bai, Tao Luo, Zhi-Qin John Xu*, Yaoyu Zhang*, Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks. arxiv 2205.13283 (2022) pdf, and in arxiv.

[29] Hanxu Zhou, Qixuan Zhou, Zhenyuan Jin, Tao Luo, Yaoyu Zhang, Zhi-Qin John Xu*, Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width. arxiv 2205.12101 (2022) pdf, and in arxiv.

[28] Zhongwang Zhang, Hanxu Zhou, Zhi-Qin John Xu*, Dropout in training neural networks: flatness of solution and noise structure. arxiv 2111.01022 (2021) pdf, and in arxiv.

[27] Zhi-Qin John Xu*, Yaoyu Zhang, Tao Luo, Overview frequency principle/spectral bias in deep learning. arxiv 2201.07395 (2022) pdf, and in arxiv.

[26] Tianhan Zhang*, Yuxiao Yi, Yifan Xu, Zhi X. Chen, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu*, A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics. arxiv 2201.03549 (2022) pdf, and in arxiv.

[25] Zhiwei Wang, Yaoyu Zhang, Yiguang Ju, Weinan E, Zhi-Qin John Xu*, Tianhan Zhang*, A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics. arxiv 2201.02025 (2022) pdf, and in arxiv.

[24] (Alphabetic order) Xi-An Li, Zhi-Qin John Xu, Lei Zhang*, Subspace Decomposition based DNN algorithm for elliptic type multi-scale PDEs. arxiv 2112.06660 (2021) pdf, and in arxiv.

[23] Yaoyu Zhang*, Yuqing Li, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu*, Embedding Principle: a hierarchical structure of loss landscape of deep neural networks. Journal of Machine Learning, (2022), pp. 60-113. web, and in arxiv.

[22] Lulu Zhang, Zhi-Qin John Xu*, Yaoyu Zhang*, Data-informed Deep Optimization. PLoS ONE (2022) in web, arxiv 2107.08166 (2021) pdf, and in arxiv.

[21] Guangjie Leng, Yekun Zhu, Zhi-Qin John Xu*, Force-in-domain GAN inversion. arxiv 2107.06050 (2021) pdf, and in arxiv.

[20] Lulu Zhang, Tao Luo, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu*, Zheng Ma*, MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs. Communications in Computational Physics (CiCP) (2022) to appear, arxiv 2107.03673 (2021) pdf, and in arxiv.

[19] Yaoyu Zhang*, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu*, Embedding Principle of Loss Landscape of Deep Neural Networks. NeurIPS 2021 spotlight, arxiv 2105.14573 (2021) pdf, and in arxiv, see slides, and Talk on Bilibili

[18] Hanxu Zhou, Tao Luo, Yaoyu Zhang*, Zhi-Qin John Xu*, Towards Understanding the Condensation of Neural Networks at Initial Training. arxiv 2105.11686 (2021) pdf, and in arxiv, see slides and video talk in Chinese

[17] Yaoyu Zhang, Tao Luo, Zheng Ma, Zhi-Qin John Xu*, Linear Frequency Principle Model to Understand the Absence of Overfitting in Neural Networks. Chinese Physics Letters, 2021. pdf, and in arxiv, see CPL web

[16] (Alphabetic order) Yuheng Ma, Zhi-Qin John Xu*, Jiwei Zhang*, Frequency Principle in Deep Learning Beyond Gradient-descent-based Training, arxiv 2101.00747 (2021). pdf, and in arxiv

[15] (Alphabetic order) Jihong Wang, Zhi-Qin John Xu*, Jiwei Zhang*, Yaoyu Zhang, Implicit bias in understanding deep learning for solving PDEs beyond Ritz-Galerkin method, CSIAM Trans. Appl. Math. web, arxiv 2002.07989 (2020). pdf, and in arxiv

[14] (Alphabetic order) Tao Luo*, Zheng Ma, Zhiwei Wang, Zhi-Qin John Xu, Yaoyu Zhang, An Upper Limit of Decaying Rate with Respect to Frequency in Deep Neural Network, To appear in Mathematical and Scientific Machine Learning 2022 (MSML22), arxiv 2105.11675 (previous version: 2012.03238) (2020). pdf, and in arxiv

Note: [13] is a comprehensive version of [6].

[13] (Alphabetic order) Tao Luo*, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang, On the exact computation of linear frequency principle dynamics and its generalization, SIAM Journal on Mathematics of Data Science (SIMODS) to appear, arxiv 2010.08153 (2020). pdf, and in arxiv, some code is in github.

[12] (Alphabetic order) Xi-An Li, Zhi-Qin John Xu* , Lei Zhang, A multi-scale DNN algorithm for nonlinear elliptic equations with multiple scales, arxiv 2009.14597, (2020) Communications in Computational Physics (CiCP). pdf, and in web, and in arxiv, some code is in github.

[11] Zhi-Qin John Xu* , Hanxu Zhou, Deep frequency principle towards understanding why deeper learning is faster, Proceedings of the AAAI Conference on Artificial Intelligence 2021, arxiv 2007.14313 (2020) pdf, and in arxiv, and AAAI web, and slides, and AAAI speech script slides

[10] Tao Luo#, Zhi-Qin John Xu #, Zheng Ma, Yaoyu Zhang*, Phase diagram for two-layer ReLU neural networks at infinite-width limit, arxiv 2007.07497 (2020), Journal of Machine Learning Research (2021) pdf, and in arxiv

Note: [9] is a comprehensive version of [8]

[9] Ziqi Liu, Wei Cai, Zhi-Qin John Xu* , Multi-scale Deep Neural Network (MscaleDNN) for Solving Poisson-Boltzmann Equation in Complex Domains, arxiv 2007.11207 (2020) Communications in Computational Physics (CiCP). pdf, and in web, some code is in github.

[8] (Alphabetic order) Wei Cai, Zhi-Qin John Xu* , Multi-scale Deep Neural Networks for Solving High Dimensional PDEs, arxiv 1910.11710 (2019) pdf, and in arxiv

[7] (Alphabetic order) Tao Luo, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang, Theory of the frequency principle for general deep neural networks, CSIAM Trans. Appl. Math., arXiv preprint, 1906.09235 (2019). arxiv, in web, see pdf

[6] Yaoyu Zhang, Zhi-Qin John Xu* , Tao Luo, Zheng Ma, Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks. arXiv preprint, 1905.10264 (2019) pdf, and in arxiv

[5] Yaoyu Zhang, Zhi-Qin John Xu* , Tao Luo, Zheng Ma, A type of generalization error induced by initialization in deep neural networks. arXiv preprint: 1905.07777 (2019), 1st Mathematical and Scientific Machine Learning Conference (MSML2020). pdf, and in web

[4] Zhi-Qin John Xu* , Yaoyu Zhang, Tao Luo, Yanyang Xiao, Zheng Ma, Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks, arXiv preprint: 1901.06523, Communications in Computational Physics (CiCP). pdf, and in web, some code is in github (2021世界人工智能大会青年优秀论文提名奖).

Note: Most of [2] and [3] are combined into paper [4].

[3] Zhi-Qin John Xu* , Frequency Principle in Deep Learning with General Loss Functions and Its Potential Application, arXiv preprint: 1811.10146 (2018). pdf, and in arxiv

[2] Zhi-Qin John Xu* , Understanding training and generalization in deep learning by Fourier analysis, arXiv preprint: 1808.04295, (2018). pdf, and in arxiv

[1] Zhi-Qin John Xu* , Yaoyu Zhang, and Yanyang Xiao, Training behavior of deep neural network in frequency domain, arXiv preprint: 1807.01251, (2018), 26th International Conference on Neural Information Processing (ICONIP 2019). pdf, and in web

Computational Neuroscience

[8] Zhi-Qin John Xu, Xiaowei Gu, Chengyu Li, David Cai, Douglas Zhou*, David W. McLaughlin*. Neural networks of different species, brain areas and states can be characterized by the probability polling state, European Journal of Neuroscience (2020). pdf, and in web

[7] Zhi-Qin John Xu, Douglas Zhou, David Cai, Swift Two-sample Test on High-dimensional Neural Spiking Data, arxiv preprint 1811.12314, (2018). (Europhysics Letters) pdf, and in web

[6] Zhi-Qin John Xu* , Fang Xu, Guoqiang Bi, Douglas Zhou*, David Cai, A Cautionary Tale of Entropic Criteria in Assessing the Validity of Maximum Entropy Principle, (2018). (Europhysics Letters) pdf, and in web

[5] Zhi-Qin John Xu, Jennifer Crodelle, Douglas Zhou*, David Cai, Maximum Entropy Principle Analysis in Network Systems with Short-time Recordings, Physical Review E, DOI: 10.1103/PhysRevE.99.022409, (2019). pdf, and in web

[4] Zhi-Qin John Xu* , Douglas Zhou*, David Cai, Dynamical and Coupling Structure of Pulse-Coupled Networks in Maximum Entropy Analysis, Entropy 2019, 21(1). pdf, and in web

[3] Zhi-Qin John Xu, Guoqiang Bi, Douglas Zhou*, and David Cai*, A dynamical state underlying the second order maximum entropy principle in neuronal networks, Communications in Mathematical Sciences, 15 (2017), pp. 665–692. pdf, and in web

[2] Douglas Zhou, Yanyang Xiao, Yaoyu Zhang, Zhiqin Xu, and David Cai*, Granger causality network reconstruction of conductance-based integrate-and-fire neuronal systems, PloS one, 9 (2014). pdf, and in web

[1] Douglas Zhou, Yanyang Xiao, Yaoyu Zhang, Zhiqin Xu, and David Cai*, Causal and structural connectivity of pulse-coupled nonlinear networks, Physical review letters, 111 (2013), p. 054102, pdf, and in web