论文标题
Diversenet:当正确的答案还不够时
DiverseNet: When One Right Answer is not Enough
论文作者
论文摘要
机器视觉中的许多结构化预测任务都有一系列可接受的答案,而不是一个确定的基础真相答案。例如,图像的分割会受到人类标记偏差的影响。同样,有多个可能的像素值可以合理地完成遮挡图像区域。通常对最先进的监督学习方法进行了优化,以对每个查询进行单个测试时间预测,无法在输出空间中找到其他模式。允许采样的现有方法通常会牺牲速度或准确性。 我们介绍了一种训练神经网络的简单方法,该方法可以为每个测试时间查询做出各种结构化预测。对于单个输入,我们学会预测一系列可能的答案。我们与通过网络合奏寻求多样性的方法相比有利。这样的随机多项选择学习面对模式崩溃,其中一个或多个合奏成员无法接收任何培训信号。我们的最佳性能解决方案可以用于各种任务,仅涉及对现有的单模架构,损耗功能和培训制度的小修改。我们证明我们的方法会导致三个具有挑战性的任务进行定量改进:2D图像完成,3D卷估计和流动预测。
Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a single test-time prediction for each query, failing to find other modes in the output space. Existing methods that allow for sampling often sacrifice speed or accuracy. We introduce a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query. For a single input, we learn to predict a range of possible answers. We compare favorably to methods that seek diversity through an ensemble of networks. Such stochastic multiple choice learning faces mode collapse, where one or more ensemble members fail to receive any training signal. Our best performing solution can be deployed for various tasks, and just involves small modifications to the existing single-mode architecture, loss function, and training regime. We demonstrate that our method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.