Zhi-Qin John Xu, 许志钦

Tenure-track Associate Professor, 副教授

Institute of Natural Sciences, 自然科学研究院

School of Mathematical Sciences, 数学科学学院

Shanghai Jiao Tong University, 上海交通大学

Zhi-Qin John Xu is an associate professor at the Institute of Natural Sciences/School of Mathematical Sciences, Shanghai Jiao Tong University. Zhi-Qin graduated from Zhiyuan College of Shanghai Jiao Tong University in 2012. In 2016, he graduated from Shanghai Jiao Tong University with a doctor's degree in applied mathematics. From 2016 to 2019, he was a postdoctoral fellow at NYU ABU Dhabi and the Courant Institute. For language model, he identifies the complexity of model is critical to the memorization and reason capbility of a language model. In deep learning theory, he and collaborators discovered frequency principle, parameter condensation and embedding principles in deep learning, and developed multi-scale neural networks; In AI for Science, mainly combustion, he and collaborators developed deep learning based mechanism reduction (DeePMR) and deep learning based surrogate model for accelerating the simulation of chemical kinetics (DeepCK). He published papers as the first author or corresponding author at TPAMI, JMLR, NeurIPS，ICLR, AAAI， SIMODS, CiCP, CSIAM Trans. Appl. Math., JCP, Combustion and Flame, Eur. J. Neurosci. etc. Currently, he is the Managing Editor of Journal of Machine Learning.

许志钦，上海交通大学自然科学研究院/数学科学学院长聘教轨副教授。主持基金委优秀青年项目、科技部重点研发计划青年科学家项目、面上等。 2012年本科毕业于上海交通大学致远学院。2016年博士毕业于上海交通大学，获应用数学博士学位。 2016年至2019年，在纽约大学阿布扎比分校和柯朗研究所做博士后。在大模型方面，发现复杂度对大模型记忆和推理影响的机制。在深度学习基础研究方面，与合作者共同发现深度学习中的频率原则、参数凝聚和能量景观嵌入原则，发展多尺度神经网络等。在AI for Science，主要是在燃烧化学反应方面，与合作者共同发展基于深度深习的机理简化方法和基于深度学习的替代模型加速燃烧模拟。以第一作者或者通讯作者身份发表论文于TPAMI, JMLR，NeurIPS，ICLR, AAAI，SIMODS，CiCP，CSIAM Trans. Appl. Math.，JCP, Combustion and Flame，Eur. J. Neurosci.等学术期刊和会议。现为Journal of Machine Learning的managing editor。

Book (Chinese) in progress

Welcome to give any feedback!一直在更新！

《深度学习现象导论》许志钦、张耀宇，下载第五版

这本书以现象驱动介绍深度学习的一些基本知识，以及提供理解。PPT和更多信息请见github

详细版B站课程中文

精简版B站课程英文

Youtube short lecture series: Phenomenon-driven understanding for deep learning

cover the following topics:

Introduction to Phenomenon-driven understanding

Low-frequency bias in output space during training (frequency principle/spectral bias)

Neurons in the same layer tend to be similar during training (neuron condensation/alignment)

Reasoning of transformer-based language models

Language Models

Language model research faces significant challenges, especially for academic research groups with constrained resources. These challenges include complex data structures, unknown target functions, high computational costs and memory requirements, and a lack of interpretability in the inference process, etc. Drawing a parallel to the use of simple models in scientific research, we propose the concept of an anchor function. This is a type of benchmark functions designed for studying language models.

[59] Junjie Yao, Zhongwang Zhang*, Zhi-Qin John Xu*, An Analysis for Reasoning Bias of Language Models with Small Initialization, ICML Spotlight 2025， arxiv 2502.04375 (2025), and in pdf, and in arxiv

[58] Pengxiao Lin, Zhongwang Zhang*, Zhi-Qin John Xu*, Reasoning Bias of Next Token Prediction Training, arxiv 2502.02007 (2024), and in pdf, and in arxiv

[57] Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu*, Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers, arxiv 2501.08537 (2024), and in pdf, and in arxiv

[54] Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu*, Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation, arxiv 2405.15302 (2024), and in pdf, and in arxiv

[49] Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu*, Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing, NeurIPS 2024, arxiv 2405.05409 (2024), and in pdf, and in arxiv

[48] Zhongwang Zhang#, Zhiwei Wang#, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu*, Anchor function: a type of benchmark functions for studying language models, arxiv 2401.08309 (2024), and in pdf, and in arxiv

New Journal: Journal of Machine Learning

We launched a new journal: Journal of Machine Learning (JML). Welcome to submit papers to JML.

Editor-in-Chief: Prof. Weinan E, Prof. Jianfeng Lu

Scope: Journal of Machine Learning (JML) publishes high quality research papers in all areas of machine learning, including innovative algorithms of machine learning, theories of machine learning, important applications of machine learning in AI, natural sciences, social sciences, and engineering etc. The journal emphasizes a balanced coverage of both theory and practice. The journal is published in a timely fashion in electronic form.

WHY: Although the world is generally over-populated with journals, the field of machine learning (ML) is one exception. In mathematics, we just do not have a recognized venue (other than conference proceedings) for publishing our work on ML. In AI for Science, ideally, we would like to publish our work in leading scientific journals such as Physical Review Letters. However, this is a difficult task when we are at the stage of developing methodologies. Although there are many conferences in ML-related areas, publishing in journal form is still the preferred venue in many disciplines.

The objective for Journal of Machine Learning (JML) is to become a leading journal in all areas related to ML, including algorithms and theory for ML, as well as applications to science and AI. JML will start as a quarterly publication. Considering the fact that ML is a vast and fast-developing field, we will do our best to carry out a thorough and responsive review process. To this end, we will have a group of young and active managing editors who will handle the review process, and a large, interdisciplinary group of experienced board members who can offer quick opinions and suggest reviewers when needed.

Open access: Yes.

Fee: NO.

Sponsor: Center for Machine Learning Research, Peking University & AI for Science Institute, Beijing

Publisher: Global Science Press, but the editorial board owns the journal.

Editorial Board

materials

A suggested notation for machine learning (通用机器学习符号) published by BAAI (北京智源), see page in github or the page in BAAI

slides at 第一届机器学习与科学应用大会 CSML2022.

slides at CSIAM 2020

B站课程

github code

Contact

Email: xuzhiqin at sjtu dot edu dot cn, Tel: 021-54742760

Office: 326, No.5 Science Building，No. 800 Dongchuan Road，

Shanghai Jiao Tong University，Minhang District, Shanghai