神经音频综合中的提升伪影

论文标题

神经音频综合中的提升伪影

Upsampling artifacts in neural audio synthesis

论文作者

Pons, Jordi, Pascual, Santiago, Cengarle, Giulio, Serrà, Joan

论文摘要

神经音频合成的最新进展取决于上采样层，这可能引入不希望的人工制品。在计算机视觉中，已经研究了上取样的伪像，被称为棋盘伪影（由于其特征性的视觉图案）。但是，到目前为止，在音频处理中，它们的效果已被忽略。在这里，我们通过从音频信号处理的角度研究此问题来解决这一差距。我们首先表明，上采样伪像的主要来源是：（i）有问题的Uppriping Operators引入的音调和过滤伪像，以及（ii）在up采样时出现的光谱复制品。然后，我们比较了不同的UP采样层，表明最近的邻居UPSMPLER可以替代有问题的（但最先进的）转置和子像素卷积，这些卷积容易引入色调伪像。

A number of recent advances in neural audio synthesis rely on upsampling layers, which can introduce undesired artifacts. In computer vision, upsampling artifacts have been studied and are known as checkerboard artifacts (due to their characteristic visual pattern). However, their effect has been overlooked so far in audio processing. Here, we address this gap by studying this problem from the audio signal processing perspective. We first show that the main sources of upsampling artifacts are: (i) the tonal and filtering artifacts introduced by problematic upsampling operators, and (ii) the spectral replicas that emerge while upsampling. We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题