基于文本的基础基于基础的小型指标，用于评估音频字幕相似性

论文标题

基于文本的基础基于基础的小型指标，用于评估音频字幕相似性

Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity

论文作者

Bhosale, Swapnil, Chakraborty, Rupayan, Kopparapu, Sunil Kumar

论文摘要

自动音频字幕（AAC）是指将音频示例转换为自然语言（NL）文本的任务，该文本描述了音频事件，事件的来源及其关系。与NL文本生成任务不同，该任务依赖于Bleu，Rouge，基于词汇语义的流星等指标以进行评估，AAC评估指标需要能够映射NL文本（短语），这些能力与其他词汇语义相对应。用于评估AAC任务的当前指标缺乏对文本表示的声音感知属性的理解。在本文中，Wepropose是基于文本到原告接地（TAG）的新型指标，该指标可用于评估AAC等跨模态任务。与NL文本和图像字幕文献中使用的现有指标相比，公开可用的AAC数据集实验表明我们的评估指标的性能更好。

Automatic Audio Captioning (AAC) refers to the task of translating an audio sample into a natural language (NL) text that describes the audio events, source of the events and their relationships. Unlike NL text generation tasks, which rely on metrics like BLEU, ROUGE, METEOR based on lexical semantics for evaluation, the AAC evaluation metric requires an ability to map NL text (phrases) that correspond to similar sounds in addition lexical semantics. Current metrics used for evaluation of AAC tasks lack an understanding of the perceived properties of sound represented by text. In this paper, wepropose a novel metric based on Text-to-Audio Grounding (TAG), which is, useful for evaluating cross modal tasks like AAC. Experiments on publicly available AAC data-set shows our evaluation metric to perform better compared to existing metrics used in NL text and image captioning literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题