ATSS-NET：通过基于注意的神经网络分离目标扬声器

论文标题

ATSS-NET：通过基于注意的神经网络分离目标扬声器

Atss-Net: Target Speaker Separation via Attention-based Neural Network

论文作者

Li, Tingle, Lin, Qingjian, Bao, Yuanyuan, Li, Ming

论文摘要

最近，已将基于深度学习的目标扬声器分离引入了卷积神经网络（CNN）和长期短期记忆（LSTM）模型。在本文中，我们在频谱图域中提出了一个基于注意力的神经网络（ATSS-NET）。与CNN-LSTM体系结构相比，它允许网络计算每个特征相位层之间的相关性，并使用较浅的层提取更多特征。实验结果表明，我们的ATSS-NET比语音滤光器产生的性能更好，尽管它仅包含一半参数。此外，我们提出的模型还表明了语音增强的有希望的表现。

Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题