分层深层多模式网络用于医学视觉问题回答

论文标题

分层深层多模式网络用于医学视觉问题回答

Hierarchical Deep Multi-modal Network for Medical Visual Question Answering

论文作者

Gupta, Deepak, Suman, Swati, Ekbal, Asif

论文摘要

在医疗领域（VQA-MED）中回答的视觉问题在为最终用户提供医疗援助方面起着重要作用。期望这些用户提出一个直接的问题，是一个是/否答案，或者一个具有挑战性的问题，需要详细的描述性答案。 VQA-MED中的现有技术无法区分不同的问题类型有时会使问题变得更复杂，或者过度简化复杂的问题。当然，对于不同的问题类型，几种不同的系统会导致最终用户的困惑和不适。为了解决这个问题，我们提出了一个分层深的多模式网络，该网络分析和分类最终用户问题/查询，然后结合一种特定于查询的方法以进行答案预测。我们将我们提出的方法称为基于层次问题分离的基于层次的视觉问题回答，在简短的HQS-VQA中。我们的贡献是三倍，即。首先，我们提出了一种针对VQAMED的问题分离（QS）技术。其次，我们将QS模型集成到层次深度多模式神经网络中，以生成与医学图像相关的查询的正确答案。第三，我们通过将提出模型的性能与QS和没有QS的模型进行比较，研究QS在医疗VQA中的影响。我们在两个基准数据集上评估了我们提出的模型的性能，即。 RAD和CLEF18。实验结果表明，我们提出的HQS-VQA技术的表现优于基线模型，并具有明显的边距。我们还对获得的结果进行了详细的定量和定性分析，并发现了错误及其解决方案的潜在原因。

Visual Question Answering in Medical domain (VQA-Med) plays an important role in providing medical assistance to the end-users. These users are expected to raise either a straightforward question with a Yes/No answer or a challenging question that requires a detailed and descriptive answer. The existing techniques in VQA-Med fail to distinguish between the different question types sometimes complicates the simpler problems, or over-simplifies the complicated ones. It is certainly true that for different question types, several distinct systems can lead to confusion and discomfort for the end-users. To address this issue, we propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries and then incorporates a query-specific approach for answer prediction. We refer our proposed approach as Hierarchical Question Segregation based Visual Question Answering, in short HQS-VQA. Our contributions are three-fold, viz. firstly, we propose a question segregation (QS) technique for VQAMed; secondly, we integrate the QS model to the hierarchical deep multi-modal neural network to generate proper answers to the queries related to medical images; and thirdly, we study the impact of QS in Medical-VQA by comparing the performance of the proposed model with QS and a model without QS. We evaluate the performance of our proposed model on two benchmark datasets, viz. RAD and CLEF18. Experimental results show that our proposed HQS-VQA technique outperforms the baseline models with significant margins. We also conduct a detailed quantitative and qualitative analysis of the obtained results and discover potential causes of errors and their solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题