论文标题

基于度量浓度的局部内在维度估计器

Local intrinsic dimensionality estimators based on concentration of measure

论文作者

Bac, Jonathan, Zinovyev, Andrei

论文摘要

内在维度(ID)是多维数据点云的最基本特征之一。了解ID对于选择适当的机器学习方法以及了解其行为并验证它至关重要。可以在全球计算整个数据点分布的ID,也可以在数据空间的不同区域进行本地计算。在本文中,我们基于多维数据点云的线性可分离性介绍了新的ID局部估计器,这是度量集中的表现之一。我们从经验上研究了这些估计器的性质,并将其与其他最近引入的ID估计量进行了比较,从而利用了测量浓度的各种影响。观察到的估计器之间的差异可用于预测其在实际应用中的行为。

Intrinsic dimensionality (ID) is one of the most fundamental characteristics of multi-dimensional data point clouds. Knowing ID is crucial to choose the appropriate machine learning approach as well as to understand its behavior and validate it. ID can be computed globally for the whole data point distribution, or computed locally in different regions of the data space. In this paper, we introduce new local estimators of ID based on linear separability of multi-dimensional data point clouds, which is one of the manifestations of concentration of measure. We empirically study the properties of these estimators and compare them with other recently introduced ID estimators exploiting various effects of measure concentration. Observed differences between estimators can be used to anticipate their behaviour in practical applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源