非正常误差分布对量子机学习模型的基准测定和排名的影响

论文标题

非正常误差分布对量子机学习模型的基准测定和排名的影响

Impact of non-normal error distributions on the benchmarking and ranking of Quantum Machine Learning models

论文作者

Pernot, Pascal, Huang, Bing, Savin, Andreas

论文摘要

量子机学习模型在原子模拟社区中已获得了大量的吸引力。通常，使用学习曲线（预测错误与训练集大小）评估和比较相对模型性能。本文说明了使用平均绝对误差（MAE）进行基准测试的局限性，这在非正常误差分布的情况下尤其重要。我们更具体地分析了用SLATM表示和L 2距离度量标准（KRR-SLATM-L2）的内核脊回归的预测误差分布，以在理论CCSD（T）/CC-PVDZ的水平上计算出的QM7B分子的有效雾化能量。还评估了CCSD（T）值的同一基础集HF和MP2的误差分布，并将其与KRR模型进行了比较。我们表明，QM7B数据集对KRR-SLATM-L2方法的真实性能是通过平均绝对误差评估不佳的，并且在适应学习集后可以显着改善。

Quantum machine learning models have been gaining significant traction within atomistic simulation communities. Conventionally, relative model performances are being assessed and compared using learning curves (prediction error vs. training set size). This article illustrates the limitations of using the Mean Absolute Error (MAE) for benchmarking, which is particularly relevant in the case of non-normal error distributions. We analyze more specifically the prediction error distribution of the kernel ridge regression with SLATM representation and L 2 distance metric (KRR-SLATM-L2) for effective atomization energies of QM7b molecules calculated at the level of theory CCSD(T)/cc-pVDZ. Error distributions of HF and MP2 at the same basis set referenced to CCSD(T) values were also assessed and compared to the KRR model. We show that the true performance of the KRR-SLATM-L2 method over the QM7b dataset is poorly assessed by the Mean Absolute Error, and can be notably improved after adaptation of the learning set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题