论文标题
ATSS-NET:通过基于注意的神经网络分离目标扬声器
Atss-Net: Target Speaker Separation via Attention-based Neural Network
论文作者
论文摘要
最近,已将基于深度学习的目标扬声器分离引入了卷积神经网络(CNN)和长期短期记忆(LSTM)模型。在本文中,我们在频谱图域中提出了一个基于注意力的神经网络(ATSS-NET)。与CNN-LSTM体系结构相比,它允许网络计算每个特征相位层之间的相关性,并使用较浅的层提取更多特征。实验结果表明,我们的ATSS-NET比语音滤光器产生的性能更好,尽管它仅包含一半参数。此外,我们提出的模型还表明了语音增强的有希望的表现。
Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.