论文标题
通过内在维度无监督学习普遍的关键行为
Unsupervised learning universal critical behavior via the intrinsic dimension
论文作者
论文摘要
从最小处理的数据集中识别通用性能是应用于统计物理学的机器学习技术的一个目标。在这里,我们研究了如何准确描述数据集的重要特征所需的最小变量数量 - 内在维度($ i_d $) - 在相变的附近行为。我们采用最新的基于最接近的邻居的$ i_d $估计器来计算不同相变的原始蒙特卡洛热配置的$ i_d $:一阶,第二阶和berezinskii-kosterlitz- theless。对于所有考虑的情况,我们发现$ i_d $唯一地表征了过渡制度。 $ I_D $的有限尺寸分析允许不仅可以识别依赖订单参数的{\ it先验}标识的方法的准确性,还可以确定相应的(关键)指数$ν$在连续过渡的情况下。对于拓扑过渡的情况,该分析克服了影响其他无监督学习方法的报告局限性。我们的工作揭示了原始数据集如何在没有任何维度减少方案的情况下显示出通用行为的独特签名,并建议在真实空间中的常规顺序参数与数据空间中的固有维度之间进行直接并行。
The identification of universal properties from minimally processed data sets is one goal of machine learning techniques applied to statistical physics. Here, we study how the minimum number of variables needed to accurately describe the important features of a data set - the intrinsic dimension ($I_d$) - behaves in the vicinity of phase transitions. We employ state-of-the-art nearest neighbors-based $I_d$-estimators to compute the $I_d$ of raw Monte Carlo thermal configurations across different phase transitions: first-, second-order and Berezinskii-Kosterlitz-Thouless. For all the considered cases, we find that the $I_d$ uniquely characterizes the transition regime. The finite-size analysis of the $I_d$ allows not just to identify critical points with an accuracy comparable with methods that rely on {\it a priori} identification of order parameters, but also to determine the corresponding (critical) exponent $ν$ in case of continuous transitions. For the case of topological transitions, this analysis overcomes the reported limitations affecting other unsupervised learning methods. Our work reveals how raw data sets display unique signatures of universal behavior in the absence of any dimensional reduction scheme, and suggest direct parallelism between conventional order parameters in real space, and the intrinsic dimension in the data space.