部分监督的多任务网络用于单期饮食评估

论文标题

部分监督的多任务网络用于单期饮食评估

Partially Supervised Multi-Task Network for Single-View Dietary Assessment

论文作者

Lu, Ya, Stathopoulou, Thomai, Mougiakakou, Stavroula

论文摘要

食物体积估计是饮食评估的过程中的重要步骤，要求对食物表面和桌面的精确深度估计。基于计算机视觉的现有方法需要多图像输入或其他深度图，从而降低了实施的便利性和实际意义。尽管最新的无监督深度估算得出的进步，但在较大的无纹理区域的情况下，达到的性能需要改善。在本文中，我们提出了一个网络体系结构，该网络结构共同执行几何理解（即深度预测和3D平面估计）和单个食物图像上的语义预测，无论目标平面的纹理特征如何，都可以实现强大而准确的食物体积估计。对于网络的培训，仅需要具有语义地面真相的单眼视频，而深度图和3D平面地面真相不再需要。两个单独的食物图像数据库上的实验结果表明，我们的方法在无纹理的场景上表现出色，并且优于基于运动的方法的无监督网络和结构，而它的性能与完全监督的方法相当。

Food volume estimation is an essential step in the pipeline of dietary assessment and demands the precise depth estimation of the food surface and table plane. Existing methods based on computer vision require either multi-image input or additional depth maps, reducing convenience of implementation and practical significance. Despite the recent advances in unsupervised depth estimation from a single image, the achieved performance in the case of large texture-less areas needs to be improved. In this paper, we propose a network architecture that jointly performs geometric understanding (i.e., depth prediction and 3D plane estimation) and semantic prediction on a single food image, enabling a robust and accurate food volume estimation regardless of the texture characteristics of the target plane. For the training of the network, only monocular videos with semantic ground truth are required, while the depth map and 3D plane ground truth are no longer needed. Experimental results on two separate food image databases demonstrate that our method performs robustly on texture-less scenarios and is superior to unsupervised networks and structure from motion based approaches, while it achieves comparable performance to fully-supervised methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题