DOT产品内核回归的精确学习曲线和高阶缩放限制

论文标题

DOT产品内核回归的精确学习曲线和高阶缩放限制

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

论文作者

Xiao, Lechao, Hu, Hong, Misiakiewicz, Theodor, Lu, Yue M., Pennington, Jeffrey

论文摘要

随着现代机器学习模型继续推进计算前沿，在不同的模型和数据扩展制度下，为预期绩效改进的预期性能改进而变得越来越重要。当前，对预测错误如何取决于样本的数量的理论理解仅限于大样本渐近学（$ m \ to \ infty $），或者对于某些简单的数据分布，用于高维渐近级别的样品的数量，其中的样品数量均匀地缩放（$ m m \ \ m \ prepto d $）。这两个制度之间存在广泛的鸿沟，包括所有高阶比例关系$ m \ propto d^r $，这是本文的主题。我们专注于Dot-Prododuct核的内核岭回归问题，并为测试错误，偏见和差异的平均值提供精确的配方，用于从球体中均匀地绘制的数据，该数据在$ r $ r $ r $ th- th-th-th-t-th-ter th-th-ter级渐近缩放率commime $ m \ fitty $ ftty to flty unftty convantmant中保持不变。对于任何整数$ r $，我们都会观察到学习曲线的峰值，每当$ m \ d^r/r！$，从而导致多个尺度上的多个样本下降和非平凡的行为。

As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theoretical understanding of the learning curves that characterize how the prediction error depends on the number of samples is restricted to either large-sample asymptotics ($m\to\infty$) or, for certain simple data distributions, to the high-dimensional asymptotics in which the number of samples scales linearly with the dimension ($m\propto d$). There is a wide gulf between these two regimes, including all higher-order scaling relations $m\propto d^r$, which are the subject of the present paper. We focus on the problem of kernel ridge regression for dot-product kernels and present precise formulas for the mean of the test error, bias, and variance, for data drawn uniformly from the sphere with isotropic random labels in the $r$th-order asymptotic scaling regime $m\to\infty$ with $m/d^r$ held constant. We observe a peak in the learning curve whenever $m \approx d^r/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales.

下载PDF全文

下载文献需遵守相关版权规定

论文标题