基于流的神经声码器中高保真音频产生的音频消除化

论文标题

基于流的神经声码器中高保真音频产生的音频消除化

Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder

论文作者

Yoon, Hyun-Wook, Lee, Sang-Hoon, Noh, Hyeong-Rae, Lee, Seong-Whan

论文摘要

在最近的作品中，基于流动的神经声码器在实时语音生成任务方面显示出显着改善。可逆流操作的序列使模型可以将样品从简单分布转换为音频样品。 However, training a continuous density model on discrete audio data can degrade model performance due to the topological difference between latent and actual distribution.为了解决这个问题，我们提出了基于流的神经声码器中的音频去量化方法，以用于高保真音频产生。数据取消化是图像生成中的一种众所周知的方法，但尚未在音频域中进行研究。 For this reason, we implement various audio dequantization methods in flow-based neural vocoder and investigate the effect on the generated audio. We conduct various objective performance assessments and subjective evaluation to show that audio dequantization can improve audio generation quality.从我们的实验中，使用音频取消化可以产生波形音频，具有更好的谐波结构和更少的数字工件。

In recent works, a flow-based neural vocoder has shown significant improvement in real-time speech generation task. The sequence of invertible flow operations allows the model to convert samples from simple distribution to audio samples. However, training a continuous density model on discrete audio data can degrade model performance due to the topological difference between latent and actual distribution. To resolve this problem, we propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation. Data dequantization is a well-known method in image generation but has not yet been studied in the audio domain. For this reason, we implement various audio dequantization methods in flow-based neural vocoder and investigate the effect on the generated audio. We conduct various objective performance assessments and subjective evaluation to show that audio dequantization can improve audio generation quality. From our experiments, using audio dequantization produces waveform audio with better harmonic structure and fewer digital artifacts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题