论文标题
phaseaug:语音合成的一种可区分的增强,以模拟一对多映射
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
论文作者
论文摘要
以前的基于生成的对抗性网络(GAN)的神经声码编码器经过训练,可以从配对的Mel-Spectrogram中重建确切的地面真相波形,并且不考虑语音合成的一对一关系。这种常规的训练会导致歧视器和发电机都过度适应,从而导致生成的音频信号中的周期性伪像。在这项工作中,我们提出了phaseaug,这是语音综合的第一个可区分的扩展,该扩展旋转每个频率箱的相位以模拟一到一台映射。使用我们提出的方法,我们在没有任何体系结构修改的情况下均优于基础线。代码和音频样本将在https://github.com/mindslab-ai/ploseaug上找到。
Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis. This conventional training causes overfitting for both the discriminators and the generator, leading to the periodicity artifacts in the generated audio signal. In this work, we present PhaseAug, the first differentiable augmentation for speech synthesis that rotates the phase of each frequency bin to simulate one-to-many mapping. With our proposed method, we outperform baselines without any architecture modification. Code and audio samples will be available at https://github.com/mindslab-ai/phaseaug.