A Framework for Distributed Large-Scale Sparse Regression


Leng Chenlei, University of Warwick


2018.04.17 14:00-15:00


Middle Lecture Room, Math Building


An attractive approach for down-scaling a Big Data problem is to partition the dataset into subsets before fitting them via a divide and conquer approach. For a dataset with a large number of variables, this is best done via partitioning features, which however suffers from not taking correlations into account if not done properly. We propose a framework named DECO by applying a simple decorrelation step before performing sparse regression on each subset. The framework works for elliptically distributed features, heavy-tailed errors and a general class of sparsity penalties. Its performance is illustrated via sythesized and real data analysis. This is joint work with Xiangyu Wang at Google and David Dunson at Duke.