Faster Algorithms for the Constrained k-means Problem

Bhattacharya, Anup ; Jaiswal, Ragesh ; Kumar, Amit (2018) Faster Algorithms for the Constrained k-means Problem Theory of Computing Systems, 62 (1). pp. 93-115. ISSN 1432-4350

Full text not available from this repository.

Official URL: http://doi.org/10.1007/s00224-017-9820-7

Related URL: http://dx.doi.org/10.1007/s00224-017-9820-7

Abstract

The classical center based clustering problems such as k-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. For instance, consider the r -gather clustering problem where there is an additional constraint that each of the clusters should have at least r points or the capacitated clustering problem where there is an upper bound on the cluster sizes. Consider a variant of the k-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O 1, ..., O k are an arbitrary partition of the dataset and the goal is to output k-centers c 1, ..., c k such that the objective function ∑ki=1∑x∈Oi||x−ci||2 is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of k centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such k centers such that at least one of these k centers behaves well. Given an error parameter ε > 0, let ℓ denote the size of the smallest list of k-centers such that at least one of the k-centers gives a (1 + ε) approximation w.r.t. the objective function above. In this paper, we show an upper bound on ℓ by giving a randomized algorithm that outputs a list of 2O~(k/ε) k-centers. We also give a closely matching lower bound of 2Ω~(k/ε√). Moreover, our algorithm runs in time O(nd⋅2O~(k/ε)). This is a significant improvement over the previous result of Ding and Xu (2015) who gave an algorithm with running time O(n d ⋅ (log n)k ⋅ 2poly(k/ε)) and output a list of size O((log n)k ⋅ 2poly(k/ε)). Our techniques generalize for the k-median problem and for many other settings where non-Euclidean distance measures are involved.

Item Type:	Article
Source:	Copyright of this article belongs to Springer-Verlag.
ID Code:	123512
Deposited On:	29 Sep 2021 09:27
Last Modified:	29 Sep 2021 09:27

Repository Staff Only: item control page