音频标记的本体感知学习和评估

论文标题

音频标记的本体感知学习和评估

Ontology-aware Learning and Evaluation for Audio Tagging

论文作者

Liu, Haohe, Kong, Qiuqiang, Liu, Xubo, Mei, Xinhao, Wang, Wenwu, Plumbley, Mark D.

论文摘要

这项研究为音频标记任务定义了一个新的评估指标，以克服常规平均平均精度（MAP）度量的限制，该指标将不同类型的声音视为独立类别而无需考虑其关系。同样，由于声音标签的模棱两可，培训和评估集中的标签不能保证是准确和详尽的，这对使用MAP进行了稳健的评估带来了挑战。拟议的度量，本体意识到的平均平均精度（OMAP）通过在评估过程中利用音频本本体信息来解决MAP的弱点。具体来说，我们根据本体图形距离到目标类别的模型预测中的假阳性事件。 OMAP度量还通过本体学图中具有不同粗粒水平的评估来提供更多对模型性能的见解。我们进行人类评估，并证明OMAP与人类感知更一致，而不是MAP。为了进一步验证利用本体信息的重要性，我们还提出了一种新颖的损失函数（OBCE），该函数（OBCE）基于本体论距离重新重量二进制跨熵（BCE）损失。我们的实验表明，OBCE可以在音频集标记任务上改善MAP和OMAP指标。

This study defines a new evaluation metric for audio tagging tasks to overcome the limitation of the conventional mean average precision (mAP) metric, which treats different kinds of sound as independent classes without considering their relations. Also, due to the ambiguities in sound labeling, the labels in the training and evaluation set are not guaranteed to be accurate and exhaustive, which poses challenges for robust evaluation with mAP. The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation. Specifically, we reweight the false positive events in the model prediction based on the ontology graph distance to the target classes. The OmAP measure also provides more insights into model performance by evaluations with different coarse-grained levels in the ontology graph. We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP. To further verify the importance of utilizing the ontology information, we also propose a novel loss function (OBCE) that reweights binary cross entropy (BCE) loss based on the ontology distance. Our experiment shows that OBCE can improve both mAP and OmAP metrics on the AudioSet tagging task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题