Compression, Inversion, and Approximate PCA of Dense Kernel Matrices at Near-Linear Computational Complexity


Florian Schaefer, California Institute of Technology


2018.01.03 10:30-11:30


602 Pao Yue-Kong Library


Many popular methods in machine learning, statistics, and uncertainty quantification rely on priors given by smooth Gaussian processes, like those obtained from the Mat{'e}rn covariance functions. Furthermore, many physical systems are described in terms of elliptic partial differential equations. Therefore, implicitly or explicitly, numerical simulation of these systems requires an efficient numerical representation of the corresponding Green’s operator.

The resulting kernel matrices are typically dense, leading to (often prohibitive) $O(N^2)$ or $O(N^3)$ computational complexity. In this work, we prove rigorously that the dense $N \times N$ kernel matrices obtained from elliptic boundary value problems and measurement points distributed approximately uniformly in a $d$-dimensional domain can be Cholesky factorized to accuracy $\epsilon$ in computational complexity $O\bigl(N \log^2(N) \log^{2d}(N/\epsilon) \bigr)$ in time and $O\bigl(N \log(N) \log^{d}(N/\epsilon)\bigr)$ in space.

For the closely related Mat{'e}rn covariances we observe very good results in practice, even for parameters corresponding to non-integer order equations. As a byproduct, we obtain a sparse PCA with near-optimal low rank approximation property and a fast solver for elliptic PDE. We emphasize that our algorithm requires no analytic expression for the covariance function.

Our work connects the probabilistic interpretation of the Cholesky factorization, the screening effect in spatial statistics, and numerical homogenization. In particular, results from the game theoretic approach to numerical analysis (“Gamblets”) allow us obtain rigorous error estimates.