论文标题
过度参数化方案中插值线性分类器的有限样本分析
Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime
论文作者
论文摘要
我们证明了两级线性分类的最大边缘算法的人口风险范围。对于线性可分离的训练数据,在先前的工作中已显示最大边距算法等于使用梯度下降的逻辑损失的训练极限,因为训练误差驱动到零。我们分析了该算法应用于随机数据,包括错误分类噪声。我们对干净数据的假设包括类条件分布是标准正常分布的情况。对手可能会选择错误分类的噪声,但要限制损坏标签的比例。我们的界限表明,通过足够的过度参数化,对嘈杂数据训练的最大边缘算法可以实现几乎最佳的人口风险。
We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. For linearly separable training data, the maximum margin algorithm has been shown in previous work to be equivalent to a limit of training with logistic loss using gradient descent, as the training error is driven to zero. We analyze this algorithm applied to random data including misclassification noise. Our assumptions on the clean data include the case in which the class-conditional distributions are standard normal distributions. The misclassification noise may be chosen by an adversary, subject to a limit on the fraction of corrupted labels. Our bounds show that, with sufficient over-parameterization, the maximum margin algorithm trained on noisy data can achieve nearly optimal population risk.