论文标题
专注于光谱骨折的自动回归gan vocoder
A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture
论文作者
论文摘要
生成的对抗网络(GAN)已被指示它们在实时语音合成中的使用中的优势。然而,它们中的大多数都以深层卷积层作为骨干,这可能会导致没有以前的信号信息。但是,语音信号的产生总是需要先前的波形样本重建,因为缺乏这种情况可能会导致产生的语音中的伪像。为了解决这一冲突,在本文中,我们提出了一个改进的模型:带有自我注意力层的自动回归(AR)gan vocoder,该层将自我关联在AR循环中。它不会参与推论,但可以帮助发电机学习培训中的时间依赖性。此外,进行了一项消融研究,以确认每个部分的贡献。系统的实验表明,我们的模型会导致客观和主观评估绩效的一致改进。
Generative adversarial networks (GANs) have been indicated their superiority in usage of the real-time speech synthesis. Nevertheless, most of them make use of deep convolutional layers as their backbone, which may cause the absence of previous signal information. However, the generation of speech signals invariably require preceding waveform samples in its reconstruction, as the lack of this can lead to artifacts in generated speech. To address this conflict, in this paper, we propose an improved model: a post auto-regressive (AR) GAN vocoder with a self-attention layer, which merging self-attention in an AR loop. It will not participate in inference, but can assist the generator to learn temporal dependencies within frames in training. Furthermore, an ablation study was done to confirm the contribution of each part. Systematic experiments show that our model leads to a consistent improvement on both objective and subjective evaluation performance.