论文标题
分布概括:一种新型的概括
Distributional Generalization: A New Kind of Generalization
论文作者
论文摘要
我们介绍了一个新的概括概念 - 分布概括 - 大致指出,在火车和测试时间上分类器的输出为分布 *,而不是仅仅以平均误差而关闭。例如,如果我们将30%的狗误认为是CIFAR-10火车中的猫,那么训练进行插值的重新连接实际上将使大约30%的狗作为 *测试集 *的猫,同时留下其他班级。这种行为不是通过经典概括来捕获的,这只会考虑平均误差,而不是输入域上的错误分布。我们的形式猜想比这个示例要多得多,它表征了分布概括的形式,这些形式可以从问题参数方面进行预期:模型架构,培训过程,样本数量和数据分布。我们为机器学习中各个领域的这些猜想提供了经验证据,包括神经网络,内核机和决策树。因此,我们的结果推进了我们对插值分类器的经验理解。
We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error. For example, if we mislabel 30% of dogs as cats in the train set of CIFAR-10, then a ResNet trained to interpolation will in fact mislabel roughly 30% of dogs as cats on the *test set* as well, while leaving other classes unaffected. This behavior is not captured by classical generalization, which would only consider the average error and not the distribution of errors over the input domain. Our formal conjectures, which are much more general than this example, characterize the form of distributional generalization that can be expected in terms of problem parameters: model architecture, training procedure, number of samples, and data distribution. We give empirical evidence for these conjectures across a variety of domains in machine learning, including neural networks, kernel machines, and decision trees. Our results thus advance our empirical understanding of interpolating classifiers.