This is the first academic event hosted by Center for Mathematics of Artificial Intelligence Institute. The purpose of the symposium is to promote exchanges and collaborations between scholars in mathematics and artificial intelligence. Speakers are invited to present the latest research results. Besides, a discussion session will also be organized to discuss the future direction of mathematical foundations and efficient algorithms in artificial intelligence, as well as possible collaborations between different academic areas within the university.
January 11, 2020
Room 305, No.5 Science Building, Minhang Campus, Shanghai Jiao Tong University
No registration fee. Participants should cover their own lodging and meals.
Please register online. Apply Online
|08:50 - 09:00||Openning Remarks|
|09:00 - 09:30||Shi Jin||Consensus-based High Dimensional Global Non-convex Optimization in Machine Learning|
|09:30 - 10:00||Hai Zhao||Machine Reading Comprehension oriented Language Models|
|10:00 - 10:30||Kai Yu||AgentGraph: Toward Universal DialogueManagement With Structured DeepReinforcement Learning|
|10:30 - 11:00||Bingbing Ni||Intelligent Multimedia Content Production|
|11:00 - 11:30||Group Photo & Tea Break|
|11:30 - 12:00||Jianguo Huang||Int-Deep: A Deep Learning Initialized Iterative Method for Nonlinear Problems|
|12:00 - 12:30||Jinyan Fan||Recent advances in numerical methods for nonlinear equations|
|12:30 - 14:00||Lunch|
|14:00 - 14:30||Xiaoqun Zhang||Semi-Implicit Back Propagation|
|14:30 - 15:00||Songting Li||Synaptic Integration Rules of Biological and Artificial Neurons|
|15:00 - 15:30||Tea Break|
|15:30 - 16:00||Quanshi Zhang||Deep Learning: Interpretability, Capacity, and Evaluation|
|16:00 - 16:30||Douglas Zhou||Compressed sensing coding in early sensory pathway|
Jinyan Fan, School of Mathematical Sciences, Shanghai Jiao Tong University
Nonlinear equations have wide applications in chemistry, mechanics, economy and so on. The Levenberg-Marquardt methods and trust region methods are popular and reliable for nonlinear equations, particularly for singular or ill-conditioned problems. In this talk, we will review convergence properties of these methods under the local error bound condition, which is weaker than the nonsingularity of the Jacobian at the solution. The complexity of the methods will also be discussed.
Jianguo Huang, School of Mathematical Sciences, and LSC-MOE, Shanghai Jiao Tong University
In this talk, we are going to study a deep learning initialized iterative method (Int-Deep) for low-dimensional nonlinear partial differential equations (PDEs). The corresponding framework consists of two phases. In the first phase, an expectation minimization problem formulated from a given nonlinear PDE is approximately resolved with mesh-free deep neural networks to parametrize the solution space. In the second phase, a solution ansatz of the finite element method to solve the given PDE is obtained from the approximate solution in the first phase, and the ansatz can serve as a good initial guess such that Newton’s method for solving the nonlinear PDE is able to converge to the ground truth solution with high-accuracy quickly. Systematic theoretical analysis is provided to justify the Int-Deep framework for several classes of problems. Numerical results show that the Int-Deep outperforms existing purely deep learning-based methods or traditional iterative methods (e.g., Newton’s method and the Picard iteration method). This is a joint work with Hao-Qin Wang (SJTU) and Yaizhao Yang (Purdue University).
Shi Jin, Institute of Natural Sciences, Shanghai Jiao Tong University
We introduce a stochastic interacting particle consensus system for global optimization of high dimensional non-convex functions. This algorithm does not use gradient of the function thus is suitable for non-smooth functions. We prove that under dimension-independent conditions on the parameters and initial data the algorithms converge to the neighborhood of the global minimum almost surely.
Songting Li, Institute of Natural Sciences, Shanghai Jiao Tong University
A neuron in general receives and integrates thousands of synaptic inputs from other neurons in the brain. The integration of synaptic inputs is crucial for brain information processing. Using theoretical analysis, numerical simulation, and electrophysiological experiments, we have identified novel forms of nonlinear synaptic integration rules for biological neurons. Our rules could be potentially applied to artificial neural networks including Deep Neural Network and Recurrent Neural Network in order to achieve brain-inspired computations.
Kai Yu, Computer Science and Engineering Department, Shanghai Jiao Tong University
Dialogue management is the core of conversational AI agent. It is a sequence decision problem and there has been a long lasting interest to introduce data driven approach to it. Policy optimization is the core part of statistical dialogue management. Deep reinforcement learning has been successfully used for dialogue policy optimization for a static pre-defined domain. However, when the domain changes dynamically, e.g. a new previously unseen concept (or slot) which can be then used as a database search constraint is added, or the policy for one domain is transferred to another domain, the dialogue state space and action sets both will change. This makes dialogue policy adaptation/transfer very challenging and attractive. In this talk, basic concepts of spoken dialogue system, especially dialogue management will be introduced. Then a new structured deep reinforcement learning framework, agent-graph, is proposed to address the policy adaptation problem. Simulation experiments showed that agent-graph can significantly speed up the policy learning and facilitate policy adaptation.
Hai Zhao, Department of Computer Science and Engineering, Shanghai Jiao Tong University
Traditional n-gram language models played a core role for natural language processing (NLP), which are especially helpful for speech recognition and statistical machne translation.
As deep learning was introduced into NLP equipped with low-dimensional dense word vector, it brings a new language modeling way, which combines n-gram language model training objective into sentence-level encoded representation in terms of word embedding. Such typical contextualized language models, ELMo, BERT and XLNet, generally promote a broad range of NLP tasks, including syntactic and semantic parsing, machine translation and machine reading comprehension (MRC).
MRC is a newly introduced NLP task which needs to carefully handle the triple of passage, question and answers in natural language forms, and especialy relies on effective language representation. In this talk, we survey the new deep learning enpowered language models and their contribution to the MRC tasks, analyze and discuss the technique timeline, state-of-the-art and the challenge.
Quanshi Zhang, John Hopcroft Center for Computer Science, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University
Although deep neural networks (DNNs) have achieved superior performance in different visual tasks, the knowledge representation inside a DNN is still considered as a black box. In this talk, I mainly introduce several core issues in the semantic interpretation of deep feature representations and the quantification of the representation capacity of DNNs, which include:
1. Learning a deep coupling of semantic graphs and DNNs.
2. Learning disentangled and interpretable feature representations in DNNs.
3. Learning DNNs with interpretable modular architectures.
4. Mathematically explanation of representation capacity of DNNs.
5. Evaluation of explanation methods.
Xiaoqun Zhang, Institute of Natural Sciences, Shanghai Jiao Tong University
Neural network has attracted great attention for a long time and many researchers are devoted to improve the effectiveness of neural network training algorithms. Though stochastic gradient descent (SGD) and other explicit gradient-based methods are widely adopted, there are still many challenges such as gradient vanishing and small step sizes, which leads to slow convergence and instability of SGD algorithms. Motivated by error back propagation (BP) and proximal methods, we propose a semi-implicit back propagation method for neural network training. Similar to BP, the difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping. The implicit update for both hidden neurons and parameters allows to choose large step size in the training algorithm. Finally, we also show that any fixed point of convergent sequences produced by this algorithm is a stationary point of the objective loss function. The experiments on both MNIST and CIFAR-10 demonstrate that the proposed semi-implicit BP algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy, compared to SGD and a similar algorithm ProxBP.
Douglas Zhou, Institute of Natural Sciences, Shanghai Jiao Tong University
Considering that many natural stimuli are sparse, can a sensory system evolve to take advantage of this sparsity? We explore this question and show that significant downstream reductions in the numbers of neurons transmitting stimuli observed in early sensory pathways might be a consequence of this sparsity. First, we model an early sensory pathway using an idealized neuronal network comprised of receptors and downstream sensory neurons. Then, by revealing a linear structure intrinsic to neuronal network dynamics, our work points to a potential mechanism for transmitting sparse stimuli, related to compressed-sensing (CS) type data acquisition. Through simulation, we examine the characteristics of networks that are optimal in sparsity encoding, and the impact of localized receptive fields beyond conventional CS theory. The results of this work suggest a new network framework of signal sparsity, freeing the notion from any dependence on specific component space representations. We expect our CS network mechanism to provide guidance for studying sparse stimulus transmission along realistic sensory pathways as well as engineering artificial neural network designs that utilize sparsity encoding.