A Simple D 2-Sampling Based PTAS for k-Means and other Clustering Problems

Dimensions

Jaiswal, Ragesh ; Kumar, Amit ; Sen, Sandeep (2012) A Simple D 2-Sampling Based PTAS for k-Means and other Clustering Problems Lecture Notes in Computer Science, 7434 . pp. 13-24. ISSN 0302-9743

Full text not available from this repository.

Official URL: http://doi.org/10.1007/978-3-642-32241-9_2

Related URL: http://dx.doi.org/10.1007/978-3-642-32241-9_2

Abstract

Given a set of points P ⊂ ℝ d , the k-means clustering problem is to find a set of k centers C = {c 1,...,c k }, c i ∈ ℝ d , such that the objective function ∑ x ∈ P d(x,C)2, where d(x,C) denotes the distance between x and the closest center in C, is minimized. This is one of the most prominent objective functions that have been studied with respect to clustering. D 2-sampling [1] is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points P ⊆ ℝ d , the first point is chosen uniformly at random from P. Subsequently, a point from P is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled points. D 2-sampling has been shown to have nice properties with respect to the k-means clustering problem. Arthur and Vassilvitskii [1] show that k points chosen as centers from P using D 2-sampling gives an O(logk) approximation in expectation. Ailon et. al. [2] and Aggarwal et. al. [3] extended results of [1] to show that O(k) points chosen as centers using D 2-sampling give O(1) approximation to the k-means objective function with high probability. In this paper, we further demonstrate the power of D 2-sampling by giving a simple randomized (1 + ε)-approximation algorithm that uses the D 2-sampling in its core.

Item Type:	Article
Source:	Copyright of this article belongs to Springer-Verlag Berlin Heidelberg.
ID Code:	118547
Deposited On:	25 May 2021 05:36
Last Modified:	25 May 2021 05:36

Repository Staff Only: item control page

PlumX Metrics