播放它：音频识别的迭代关注

论文标题

播放它：音频识别的迭代关注

Play It Back: Iterative Attention for Audio Recognition

论文作者

Stergiou, Alexandros, Damen, Dima

论文摘要

听觉认知的关键功能是随着时间的推移，特征声音与相应的语义的关联。试图歧视细粒音频类别的人类通常会重播相同的歧视性声音，以提高其预测信心。我们提出了一种基于端到端的注意体系结构，通过选择性重复，可以参与整个音频序列中最具歧视性的声音。我们的模型最初使用完整的音频序列，并迭代地完善了基于插槽注意的时间段。在每个播放时，使用较小的跃点长度来重播选定的片段，该长度代表这些段内的分辨率较高的特征。我们表明，我们的方法可以始终如一地在三个音频分类基准中实现最先进的性能：音频集，vgg-sound和Epic-kitchens-100。

A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time. Humans attempting to discriminate between fine-grained audio categories, often replay the same discriminative sounds to increase their prediction confidence. We propose an end-to-end attention-based architecture that through selective repetition attends over the most discriminative sounds across the audio sequence. Our model initially uses the full audio sequence and iteratively refines the temporal segments replayed based on slot attention. At each playback, the selected segments are replayed using a smaller hop length which represents higher resolution features within these segments. We show that our method can consistently achieve state-of-the-art performance across three audio-classification benchmarks: AudioSet, VGG-Sound, and EPIC-KITCHENS-100.

下载PDF全文

下载文献需遵守相关版权规定

论文标题