边界盒的声音

论文标题

边界盒的声音

The Sound of Bounding-Boxes

论文作者

Oya, Takashi, Iwase, Shohei, Morishima, Shigeo

论文摘要

在视听源分离的任务（利用视觉信息进行声音源分离）的任务中，识别图像中的对象是在分离声源之前的关键步骤。但是，在检测到的边界框上分配声音的现有方法遭受了其方法在很大程度上依赖于预训练的对象检测器的问题。具体而言，当使用这些现有方法时，需要预先确定所有可能产生声音并使用适用于所有此类类别的对象检测器的对象类别。为了解决这个问题，我们提出了一种完全无监督的方法，该方法学会了在图像中检测对象并同时分离声源。由于我们的方法不依赖任何预训练的检测器，因此我们的方法适用于任意类别，而无需任何其他注释。此外，尽管完全无监督，但我们发现我们的方法在分离精度上的性能相当。

In the task of audio-visual sound source separation, which leverages visual information for sound source separation, identifying objects in an image is a crucial step prior to separating the sound source. However, existing methods that assign sound on detected bounding boxes suffer from a problem that their approach heavily relies on pre-trained object detectors. Specifically, when using these existing methods, it is required to predetermine all the possible categories of objects that can produce sound and use an object detector applicable to all such categories. To tackle this problem, we propose a fully unsupervised method that learns to detect objects in an image and separate sound source simultaneously. As our method does not rely on any pre-trained detector, our method is applicable to arbitrary categories without any additional annotation. Furthermore, although being fully unsupervised, we found that our method performs comparably in separation accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题