Computational Limits of A Distributed Algorithm for Smoothing Spline

Shang, Zuofeng; Cheng, Guang

Computational Limits of A Distributed Algorithm for Smoothing Spline

Files

Shang_2017_computational.pdf (533.54 KB)

Date

2017

Authors

Shang, Zuofeng

Cheng, Guang

Language

English

Department

Mathematical Sciences, School of Science

Abstract

In this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline setup, we observe a phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sharp upper bound for the number of machines is established: when the number is below this bound, statistical optimality (in terms of nonparametric estimation or testing) is achievable; otherwise, statistical optimality becomes impossible. These sharp bounds partly capture intrinsic computational limits of the distributed algorithm considered in this paper, and turn out to be fully determined by the smoothness of the regression function. As a side remark, we argue that sample splitting may be viewed as an alternative form of regularization, playing a similar role as smoothing parameter.

Keywords

divide-and-conquer, computational limits, smoothing spline

Cite As

Shang, Z., & Cheng, G. (2017). Computational Limits of A Distributed Algorithm for Smoothing Spline. Journal of Machine Learning Research, 18(108), 1–37.

Journal

Journal of Machine Learning Research

Rights

Attribution 3.0 United States

Source

Publisher

Type

Article

Permanent Link

https://hdl.handle.net/1805/16223

Version

Final published version

Collections

Open Access Policy Articles
Department of Mathematical Sciences Articles

Full item page