Towards Tight Communication Lower Bounds for Distributed Optimization (NeurIPS 2021)
We consider a standard distributed optimization setting where N machines, each holding a d-dimensional function fi, aim to jointly minimize the sum of the functions ∑Ni=1fi(x). This problem arises naturally in large-scale distributed optimization, where a standard solution is to apply variants of (stochastic) gradient descent. We focus on the communication complexity of this problem: our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the N machines to solve this problem under point-to-point communication, within a given error-tolerance. Specifically, we show that Ω(Ndlogd/Nε) total bits need to be communicated between the machines to find an additive ϵ-approximation to the minimum of ∑Ni=1fi(x). The result holds for both deterministic and randomized algorithms, and, importantly, requires no assumptions on the algorithm structure. The lower bound is tight under certain restrictions on parameter values and is matched within constant factors for quadratic objectives by a new variant of quantized gradient descent, which we describe and analyze. Our results bring over tools from communication complexity to distributed optimization, which has potential for further applications.