Logo

Near Optimal Reinforcement Learning with Self Play

01e635637cdcd5a639e63573f31ae2dd0a78e606

Speaker

Chi Jin,Princeton University

Time

2020.12.17 10:00-12:00

Venue

Online—ZOOM APP

ZOOM Info

Zoom Link: 985-434-86244

Password: 827056

Abstract

Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of existing theory in reinforcement learning only applies to the setting where a single agent plays against a fixed environment. It remains largely open how to design efficient self-play algorithms in two-player sequential games, especially when it is necessary to manage the exploration/exploitation tradeoff. In this talk, we present the first line of provably efficient self-play algorithms in a basic setting of tabular episodic Markov games. Our algorithms further feature the near-optimal sample complexity—the number of samples required by our algorithms matches the information-theoretic lower bound up to a polynomial factor of the length of each episode.

Bio

Chi Jin is assistant professor of Electrical Engineering at Princeton University. He obtained his Ph.D. in Computer Science at UC Berkeley, advised by Michael I. Jordan. He received his B.S. in Physics from Peking University. His research interest lies in theoretical machine learning, with special emphases on nonconvex optimization and reinforcement learning. His representative work includes proving noisy gradient descent / accelerated gradient descent escape saddle points efficiently, proving sample complexity bounds for Q-learning / LSVIwith UCB, and designing near-optimal algorithms for minimax optimization.