论文标题
对课堂失衡下的学习动力学的理论分析
A Theoretical Analysis of the Learning Dynamics under Class Imbalance
论文作者
论文摘要
数据不平衡是机器学习中的一个常见问题,可能会对模型的性能产生关键影响。存在各种解决方案,但是它们对学习动力的融合的影响尚不理解。在这里,我们阐明了数据不平衡对学习的显着负面影响,表明在使用基于梯度的优化器训练时,少数族裔和多数类的学习曲线遵循亚最佳轨迹。这种放缓与不平衡率有关,可以追溯到不同类别优化之间的竞争。我们的主要贡献是分析全群(GD)和随机梯度下降(SGD)的收敛性,以及对每个级别梯度的贡献重新归一量的变体。我们发现不能保证GD可以减少每个类别的损失,但是可以通过执行梯度的每类标准化来解决此问题。使用SGD,类不平衡对梯度的方向有额外的影响:少数群体遭受更高的方向性噪声,从而降低了每类梯度归一化的有效性。我们的发现不仅使我们能够了解涉及每一阶段梯度的策略的潜在和局限性,而且还可以理解先前使用解决方案在类不平衡(例如过采样)的有效性的原因。
Data imbalance is a common problem in machine learning that can have a critical effect on the performance of a model. Various solutions exist but their impact on the convergence of the learning dynamics is not understood. Here, we elucidate the significant negative impact of data imbalance on learning, showing that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based optimizer. This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes. Our main contribution is the analysis of the convergence of full-batch (GD) and stochastic gradient descent (SGD), and of variants that renormalize the contribution of each per-class gradient. We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient. With SGD, class imbalance has an additional effect on the direction of the gradients: the minority class suffers from a higher directional noise, which reduces the effectiveness of the per-class gradient normalization. Our findings not only allow us to understand the potential and limitations of strategies involving the per-class gradients, but also the reason for the effectiveness of previously used solutions for class imbalance such as oversampling.