论文标题
使用图与事件共发生的图形laplacian正则化的声音事件检测
Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-occurrence
论文作者
论文摘要
在声学场景中发生了有限的声音事件,一些声音事件往往在现场同时发生。例如,声音事件“菜肴”和“玻璃叮当”可能会在声学场景“烹饪”中共同发生。在本文中,我们提出了一种使用图Laplacian正则化的声音事件检测方法,并考虑了声音事件的同时存在。在提出的方法中,声音事件的出现表示为图表,其节点表示事件出现的频率,并且其边缘表示声音事件共发生。然后将此图表示用于声音事件检测的模型训练,考虑到声音事件的出现和同时出现的图形结构,在目标函数下进行了优化。使用TUT Sound事件2016和2017 Detaset的评估实验以及TUT声学场景2016数据集表明,与基于CNN-BIGRU的常规基于CNN-BIGRU的检测方法相比,提出的方法将声音事件检测的性能提高了7.9个百分点。特别是,实验结果表明,所提出的方法比常规方法更准确地检测了同时发生的声音事件。
A limited number of types of sound event occur in an acoustic scene and some sound events tend to co-occur in the scene; for example, the sound events "dishes" and "glass jingling" are likely to co-occur in the acoustic scene "cooking". In this paper, we propose a method of sound event detection using graph Laplacian regularization with sound event co-occurrence taken into account. In the proposed method, the occurrences of sound events are expressed as a graph whose nodes indicate the frequencies of event occurrence and whose edges indicate the sound event co-occurrences. This graph representation is then utilized for the model training of sound event detection, which is optimized under an objective function with a regularization term considering the graph structure of sound event occurrence and co-occurrence. Evaluation experiments using the TUT Sound Events 2016 and 2017 detasets, and the TUT Acoustic Scenes 2016 dataset show that the proposed method improves the performance of sound event detection by 7.9 percentage points compared with the conventional CNN-BiGRU-based detection method in terms of the segment-based F1 score. In particular, the experimental results indicate that the proposed method enables the detection of co-occurring sound events more accurately than the conventional method.