论文标题
基于深度学习的语音增强的损失功能的合并视图
A consolidated view of loss functions for supervised deep learning-based speech enhancement
论文作者
论文摘要
实时应用程序的基于深度学习的语音增强最近取得了巨大进步。由于缺乏可进行的感知优化目标,因此涉及培训损失的许多神话,而在许多情况下,尚未研究与其他因素(例如网络体系结构,功能或培训程序)隔离的损失功能成功的贡献。在这项工作中,我们研究了适用于在线逐帧处理中运行的复发性神经网络体系结构的各种损失频谱功能。我们将幅度与相关的损失,比率,相关指标和压缩指标相关。我们的结果表明,即使没有增强阶段,仅将大小与相关目标结合在一起也总是会导致改进。此外,使用压缩光谱值也可以显着改善。另一方面,线性域损耗(例如平均绝对误差)最好取得相关的改进。
Deep learning-based speech enhancement for real-time applications recently made large advancements. Due to the lack of a tractable perceptual optimization target, many myths around training losses emerged, whereas the contribution to success of the loss functions in many cases has not been investigated isolated from other factors such as network architecture, features, or training procedures. In this work, we investigate a wide variety of loss spectral functions for a recurrent neural network architecture suitable to operate in online frame-by-frame processing. We relate magnitude-only with phase-aware losses, ratios, correlation metrics, and compressed metrics. Our results reveal that combining magnitude-only with phase-aware objectives always leads to improvements, even when the phase is not enhanced. Furthermore, using compressed spectral values also yields a significant improvement. On the other hand, phase-sensitive improvement is best achieved by linear domain losses such as mean absolute error.