声学场景和声音事件的联合分析具有弱标记的数据

论文标题

声学场景和声音事件的联合分析具有弱标记的数据

Joint Analysis of Acoustic Scenes and Sound Events with Weakly labeled Data

论文作者

Tsubaki, Shunsuke, Imoto, Keisuke, Ono, Nobutaka

论文摘要

在以前的某些论文中，考虑到声学场景和声音事件相互密切相关，因此提出了利用多任务学习（MTL）基于基于多任务的神经网络的声学场景和声音事件的联合分析。在常规方法中，在MTL模型中，强烈监督的方案应用于声音事件检测，该模型需要在模型训练中进行强烈的声音事件标签。但是，注释强的事件标签非常耗时。因此，在本文中，我们提出了一种基于MTL框架和声音事件标签弱的MTL框架的声学场景和声音事件的联合分析的方法。特别是，在拟议的方法中，我们介绍了多种现实学习方案，用于弱监督的声音事件检测培训并评估四个汇总功能，即最大池，平均汇总，指数式软键合池和注意力集合。使用TUT声学场景2016/2017和TUT Sound Events 2016/2017数据集获得的实验结果表明，基于弱标签的基于MTL的方法优于传统的基于单任务的场景分类和事件检测模型，这些模型在场景分类和事件检测性能方面具有弱标签。

Considering that acoustic scenes and sound events are closely related to each other, in some previous papers, a joint analysis of acoustic scenes and sound events utilizing multitask learning (MTL)-based neural networks was proposed. In conventional methods, a strongly supervised scheme is applied to sound event detection in MTL models, which requires strong labels of sound events in model training; however, annotating strong event labels is quite time-consuming. In this paper, we thus propose a method for the joint analysis of acoustic scenes and sound events based on the MTL framework with weak labels of sound events. In particular, in the proposed method, we introduce the multiple-instance learning scheme for weakly supervised training of sound event detection and evaluate four pooling functions, namely, max pooling, average pooling, exponential softmax pooling, and attention pooling. Experimental results obtained using parts of the TUT Acoustic Scenes 2016/2017 and TUT Sound Events 2016/2017 datasets show that the proposed MTL-based method with weak labels outperforms the conventional single-task-based scene classification and event detection models with weak labels in terms of both the scene classification and event detection performances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题