论文标题
通过基于BLSTM的二进制掩码对声音源的轻量级在线分离
Lightweight Online Separation of the Sound Source of Interest through BLSTM-Based Binary Masking
论文作者
论文摘要
在线音频源分离一直是听觉场景分析和机器人试听的重要组成部分。由于其在线功能而进行此操作的主要技术类型是空间滤波(或波束形成),在这种技术中,人们认为已知的位置(主要是到达方向; DOA)已知(SOI)。但是,这些技术在最终结果中遭受了相当大的干扰泄漏。在本文中,我们提出了一项两步技术:1)基于阶段的光束形式,除了对SOI的估计外,还提供了累积环境干扰的估计; 2)基于BLSTM的TF二进制掩蔽阶段,该阶段计算二进制掩码,旨在将SOI与累积环境干扰分开。在我们的测试中,该技术在20 dB上提供了一个模拟数据,提供了一个高于20 dB的信噪比(SIR)。由于波束形式输出的性质,从一开始就处理了标签置换问题。这使得拟议的解决方案成为轻量级替代方案,与当前基于深度学习的技术相比,所需的计算资源(几乎是一个数量级),同时提供了可比的SIR性能。
Online audio source separation has been an important part of auditory scene analysis and robot audition. The main type of technique to carry this out, because of its online capabilities, has been spatial filtering (or beamforming), where it is assumed that the location (mainly, the direction of arrival; DOA) of the source of interest (SOI) is known. However, these techniques suffer from considerable interference leakage in the final result. In this paper, we propose a two step technique: 1) a phase-based beamformer that provides, in addition to the estimation of the SOI, an estimation of the cumulative environmental interference; and 2) a BLSTM-based TF binary masking stage that calculates a binary mask that aims to separate the SOI from the cumulative environmental interference. In our tests, this technique provides a signal-to-interference ratio (SIR) above 20 dB with simulated data. Because of the nature of the beamformer outputs, the label permutation problem is handled from the beginning. This makes the proposed solution a lightweight alternative that requires considerably less computational resources (almost an order of magnitude) compared to current deep-learning based techniques, while providing a comparable SIR performance.