动态的声学补偿和适应性焦点训练，以增强个性化语音

论文标题

动态的声学补偿和适应性焦点训练，以增强个性化语音

Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

论文作者

Ge, Xiaofeng, Han, Jiangyu, Guan, Haixin, Long, Yanhua

论文摘要

最近，已经提出了越来越多的个性化语音增强系统（PSE），具有出色的性能。但是，两个关键问题仍然限制了模型的性能和概括能力：1）测试嘈杂的语音与目标说话者入学式演讲之间的声学环境不匹配； 2）硬采矿和学习。在本文中，提出了动态声音补偿（DAC），以减轻环境不匹配，通过拦截嘈杂的言语中的噪音或环境声学段，并将其与干净的注册语音混合在一起。为了很好地利用训练数据中的硬样品，我们通过在训练过程中将自适应损失重量分配给硬质和非硬性样本，提出一种自适应焦点训练（AFT）策略。进一步引入了时频的多层培训，以改善和推广我们以前的PSE工作SDPCCN。在DNS4挑战数据集上检查了提出方法的有效性。结果表明，DAC在多个评估指标方面带来了很大的改进，并且船尾大大降低了硬样本，并产生明显的MOS得分改善。

Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed. However, two critical issues still limit the performance and generalization ability of the model: 1) Acoustic environment mismatch between the test noisy speech and target speaker enrollment speech; 2) Hard sample mining and learning. In this paper, dynamic acoustic compensation (DAC) is proposed to alleviate the environment mismatch, by intercepting the noise or environmental acoustic segments from noisy speech and mixing it with the clean enrollment speech. To well exploit the hard samples in training data, we propose an adaptive focal training (AFT) strategy by assigning adaptive loss weights to hard and non-hard samples during training. A time-frequency multi-loss training is further introduced to improve and generalize our previous work sDPCCN for PSE. The effectiveness of proposed methods are examined on the DNS4 Challenge dataset. Results show that, the DAC brings large improvements in terms of multiple evaluation metrics, and AFT reduces the hard sample rate significantly and produces obvious MOS score improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题