论文标题
几乎是线性时间的几乎最佳尺寸核心算法
A Nearly Optimal Size Coreset Algorithm with Nearly Linear Time
论文作者
论文摘要
核心是一个点集,其中包含有关较大点集的几何特性的信息。一系列以前的作品表明,在许多机器学习问题中,尤其是在聚类问题中,核心对于构建有效的算法非常有用。核心结构算法性能的两个主要度量是该算法的运行时间和该算法的核心输出的大小。在本文中,我们研究了$(k,z)$ - 聚类问题的核心的构建,这是$ k $ - eaneans和$ k $ -Median问题的概括。通过正确设计基于草图的距离估计数据结构,我们提出了更快的算法,该算法构造了具有最新结果的匹配大小的核心。
A coreset is a point set containing information about geometric properties of a larger point set. A series of previous works show that in many machine learning problems, especially in clustering problems, coreset could be very useful to build efficient algorithms. Two main measures of an coreset construction algorithm's performance are the running time of the algorithm and the size of the coreset output by the algorithm. In this paper we study the construction of coresets for the $(k,z)$-clustering problem, which is a generalization of $k$-means and $k$-median problem. By properly designing a sketching-based distance estimation data structure, we propose faster algorithms that construct coresets with matching size of the state-of-the-art results.