跨凝结损失的用途和滥用：现代深度学习中的案例研究

论文标题

跨凝结损失的用途和滥用：现代深度学习中的案例研究

Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning

论文作者

Gordon-Rodriguez, Elliott, Loaiza-Ganem, Gabriel, Pleiss, Geoff, Cunningham, John P.

论文摘要

现代深度学习主要是一门实验科学，其中经验进步偶尔以牺牲概率严格为代价。在这里，我们关注一个这样的例子。也就是说，将分类的跨凝性损失用于模型数据，而不是严格的分类，而是在单纯胶上使用值。这种做法是具有标签平滑和演员模仿的增强学习的神经网络体系结构的标准配置。利用最近发现的连续类别分布，我们建议对这些模型进行概率启发的替代品，提供了一种更有原则性和理论上吸引人的方法。通过仔细的实验，包括消融研究，我们在这些模型中确定了超越表现的潜力，从而突出了适当的概率治疗的重要性，并说明了其某些故障模式。

Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof.

下载PDF全文

下载文献需遵守相关版权规定

论文标题