关于使用神经声码器的鸟鸣重新合成的初步研究

论文标题

关于使用神经声码器的鸟鸣重新合成的初步研究

An Initial study on Birdsong Re-synthesis Using Neural Vocoders

论文作者

Bhatia, Rhythm, Kinnunen, Tomi H.

论文摘要

现代语音综合使用神经声码编码器直接对原始波形样本进行建模。这种增强的多功能性从语音到其他领域（例如音乐）扩大了声码器的范围。我们讨论了另一个有趣的生物声学领域。我们使用传统（世界）和两个神经（WaveNet自动编码器，平行Wovegan）的Vircane tripsong提供了初始的比较分析 - 反应实验。我们的主观结果表明，在物种歧视（ABX检验）方面，三个声码器没有差异。尽管如此，在保持类似鸟类的质量方面，世界声码器的样本被评为更高（MOS测试）。所有声码器都面临着音高和发声的问题。我们的结果表明，处理低质量野生动植物音频数据时面临的一些挑战。

Modern speech synthesis uses neural vocoders to model raw waveform samples directly. This increased versatility has expanded the scope of vocoders from speech to other domains, such as music. We address another interesting domain of bio-acoustics. We provide initial comparative analysis-resynthesis experiments of birdsong using traditional (WORLD) and two neural (WaveNet autoencoder, parallel WaveGAN) vocoders. Our subjective results indicate no difference in the three vocoders in terms of species discrimination (ABX test). Nonetheless, the WORLD vocoder samples were rated higher in terms of retaining bird-like qualities (MOS test). All vocoders faced issues with pitch and voicing. Our results indicate some of the challenges in processing low-quality wildlife audio data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题