Zoom Link: 985-434-86244
Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of existing theory in reinforcement learning only applies to the setting where a single agent plays against a fixed environment. It remains largely open how to design efficient self-play algorithms in two-player sequential games, especially when it is necessary to manage the exploration/exploitation tradeoff. In this talk, we present the first line of provably efficient self-play algorithms in a basic setting of tabular episodic Markov games. Our algorithms further feature the near-optimal sample complexity—the number of samples required by our algorithms matches the information-theoretic lower bound up to a polynomial factor of the length of each episode.
Chi Jin is assistant professor of Electrical Engineering at Princeton University. He obtained his Ph.D. in Computer Science at UC Berkeley, advised by Michael I. Jordan. He received his B.S. in Physics from Peking University. His research interest lies in theoretical machine learning, with special emphases on nonconvex optimization and reinforcement learning. His representative work includes proving noisy gradient descent / accelerated gradient descent escape saddle points efficiently, proving sample complexity bounds for Q-learning / LSVIwith UCB, and designing near-optimal algorithms for minimax optimization.