AssistQ：以负担得起的问题为中心的以Egentric助手为中心的任务完成

论文标题

AssistQ：以负担得起的问题为中心的以Egentric助手为中心的任务完成

AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

论文作者

Wong, Benita, Chen, Joya, Wu, You, Lei, Stan Weixian, Mao, Dongxing, Gao, Difei, Shou, Mike Zheng

论文摘要

AR眼镜/机器人等智能助手的长期目标是帮助用户以负担得起的现实世界情景，例如“我如何运行微波炉1分钟？”。但是，仍然没有明确的任务定义和合适的基准。在本文中，我们定义了一项名为中心问题驱动的任务完成的新任务，AI助手应从教学视频中学习，以便在用户的视图中提供逐步帮助。为了支持该任务，我们构建了AssistQ，这是一个新的数据集，其中包括100个新拍摄的教学视频中的531个问答样本。我们还开发了一种新颖的问题对侵略（Q2A）模型来解决AQTC任务并在AssistQ数据集上进行验证。结果表明，我们的模型大大优于几个与VQA相关的基线，同时仍然有大量改进的空间。我们希望我们的任务和数据集能够推进Egentric AI助手的发展。我们的项目页面可在以下网址提供：https：//showlab.github.io/assistq/。

A long-standing goal of intelligent assistants such as AR glasses/robots has been to assist users in affordance-centric real-world scenarios, such as "how can I run the microwave for 1 minute?". However, there is still no clear task definition and suitable benchmarks. In this paper, we define a new task called Affordance-centric Question-driven Task Completion, where the AI assistant should learn from instructional videos to provide step-by-step help in the user's view. To support the task, we constructed AssistQ, a new dataset comprising 531 question-answer samples from 100 newly filmed instructional videos. We also developed a novel Question-to-Actions (Q2A) model to address the AQTC task and validate it on the AssistQ dataset. The results show that our model significantly outperforms several VQA-related baselines while still having large room for improvement. We expect our task and dataset to advance Egocentric AI Assistant's development. Our project page is available at: https://showlab.github.io/assistq/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题