论文标题
特征空间在原子学习中的作用
The role of feature space in atomistic learning
论文作者
论文摘要
分子和材料的结构和组成的表现,具有物理启发的描述符在机器学习技术到原子模拟中的应用中起着关键作用。方法的扩散以及每种特征选择的事实可能会导致行为截然不同,具体取决于它们的使用方式,例如通过引入非线性内核和非欧国人指标来操纵它们,使客观地比较不同的方法变得困难,并解决了有关一个特征空间与另一个特征空间如何相关的基本问题。在这项工作中,我们介绍了一个框架,以根据它们诱导的特征空间的结构来比较不同的描述符集,并通过指标和内核进行不同的方式。我们定义诊断工具,以确定替代特征空间是否包含等效量的信息,以及从一个特征空间到另一个特征空间时是否会大幅扭曲。我们特别比较根据原子密度的N体相关性构建的表示形式,并定量评估与使用低阶特征相关的信息损失。我们还研究了广泛使用的肥皂和Behler-Parrinello特征的基本函数和超参数的不同选择的影响,并研究了非线性内核的使用以及Wasserstein-type度量的使用如何改变特征空间的结构与简单的线性特征空间相比。
Eficient, physically-inspired descriptors of the structure and composition of molecules and materials play a key role in the application of machine-learning techniques to atomistic simulations. The proliferation of approaches, as well as the fact that each choice of features can lead to very different behavior depending on how they are used, e.g. by introducing non-linear kernels and non-Euclidean metrics to manipulate them, makes it difficult to objectively compare different methods, and to address fundamental questions on how one feature space is related to another. In this work we introduce a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels, in terms of the structure of the feature space that they induce. We define diagnostic tools to determine whether alternative feature spaces contain equivalent amounts of information, and whether the common information is substantially distorted when going from one feature space to another. We compare, in particular, representations that are built in terms of n-body correlations of the atom density, quantitatively assessing the information loss associated with the use of low-order features. We also investigate the impact of different choices of basis functions and hyperparameters of the widely used SOAP and Behler-Parrinello features, and investigate how the use of non-linear kernels, and of a Wasserstein-type metric, change the structure of the feature space in comparison to a simpler linear feature space.