基于深度学习的开放式声学场景分类

论文标题

基于深度学习的开放式声学场景分类

Deep Learning Based Open Set Acoustic Scene Classification

论文作者

Kwiatkowska, Zuzanna, Kalinowski, Beniamin, Kośmider, Michał, Rykaczewski, Krzysztof

论文摘要

在这项工作中，我们比较了开放式声学场景分类（ASC）中三种选定技术的性能。我们测试深网分类器的软磁输出的阈值，这是当今ASC中最受欢迎的技术。此外，我们将结果与源自计算机视野字段得出的OpenMax分类器进行了比较。作为第三个模型，我们使用了改编的类调节自动编码器（改编C2AE），这是我们对另一种称为C2AE的计算机视觉相关技术的变化。改编的C2AE涵盖了给定实验的更公平的比较，并简化了原始推理过程，使其更适用于现实生活中的情况。我们还分析了两种培训方案：没有其他未知类别的知识，而另一个可以从未知类中进行有限的示例子集。我们发现，基于C2AE的方法的表现优于阈值和OpenMax，在接收器操作特征曲线（AUROC）下获得了$ 85.5 \％$ $的区域，以及$ 66 \％\％的开放设置精确度，用于在声学场景和事件场景和事件的检测和分类中使用的数据，并分类。

In this work, we compare the performance of three selected techniques in open set acoustic scenes classification (ASC). We test thresholding of the softmax output of a deep network classifier, which is the most popular technique nowadays employed in ASC. Further we compare the results with the Openmax classifier which is derived from the computer vision field. As the third model, we use the Adapted Class-Conditioned Autoencoder (Adapted C2AE) which is our variation of another computer vision related technique called C2AE. Adapted C2AE encompasses a more fair comparison of the given experiments and simplifies the original inference procedure, making it more applicable in the real-life scenarios. We also analyse two training scenarios: without additional knowledge of unknown classes and another where a limited subset of examples from the unknown classes is available. We find that the C2AE based method outperforms the thresholding and Openmax, obtaining $85.5\%$ Area Under the Receiver Operating Characteristic curve (AUROC) and $66\%$ of open set accuracy on data used in Detection and Classification of Acoustic Scenes and Events Challenge 2019 Task 1C.

下载PDF全文

下载文献需遵守相关版权规定

论文标题