Logo

Learning & Exploiting Low-Dimensional Structure in High-Dimensional Data

1d34d2ce3e63f1fbdef99ce58635331b29aa0b45

Speaker

Didong Li, Princeton University

Time

2020.10.22 10:00-11:00

Venue

Online—ZOOM APP

ZOOM Info

Conference ID: 946 765 51243
PIN Code: 757280

Abstract

Data lying in a high-dimensional ambient space are commonly thought to have a much lower intrinsic dimension. In particular, the data may be concentrated near a lower dimensional subspace or manifold. There is an immense literature focused on approximating the unknown subspace and the unknown density, and exploiting such approximations in clustering, data compression, and building of predictive models. Most of the literature relies on approximating subspaces and densities using a locally linear, and potentially multi-scale, dictionary with Gaussian kernels. In this talk, we propose a simple and general alternative, which instead uses pieces of spheres, or spherelets, to locally approximate the unknown subspace. I will also introduce a curved kernel called the Fisher–Gaussian (FG) kernel which outperforms multivariate Gaussians in many cases. Theory is developed showing that spherelets can produce lower covering numbers and mean square errors for many manifolds, as well as the posterior consistency of the Dirichlet process mixture of FG kernels. Results relative to state-of-the-art competitors show gains in ability to accurately approximate the subspace and the density with fewer components and parameters. Time permitting, I will also present some applications of spherelets, including classification, geodesic distance estimation and clustering.

Bio

Dr. Didong Li completed his PhD in Mathematics at Duke University, jointly advised by David B. Dunson and Sayan Mukherjee. He will be joining the Department of Computer Science at Princeton University as a postdoctoral fellow working with Barbara Engelhardt and the Department of Biostatistics at UCLA as an Assistant Project Scientist working with Sudipto Banerjee. His research interests are geometric data analysis, manifold learning, Bayesian nonparametrics, spatial statistics, and information geometry. He was one of the winners of the inaugural IMS Lawrence Brown PhD student award.