使用生成对抗网络的音频

论文标题

使用生成对抗网络的音频

Audio inpainting with generative adversarial network

论文作者

Ebner, P. P., Eltelt, A.

论文摘要

我们研究Wasserstein生成对抗网络（WGAN）产生缺失的音频内容的能力，而在上下文中（统计上相似）与声音和相邻边界。我们使用WGAN型号来应对介绍远距离差距（500毫秒）的音频挑战。我们使用新提出的WGAN架构提高了镶嵌部分的质量，该结构使用短程和远程邻近边界，与经典的WGAN模型相比。将表演与两种不同的音频乐器（钢琴和吉他）以及Virtuoso钢琴演奏者以及弦乐团进行了比较。客观差分（ODG）用于评估两种架构的性能。所提出的模型的表现优于经典的WGAN模型，并改善了高频含量的重建。此外，对于频谱主要在较低范围内的频谱的仪器，我们获得了更好的结果，在较小的范围内，小声音对人的耳朵不那么烦人，并且涂层部分更容易感知。最后，我们可以证明，如果我们仅在此特定的仪器上忽略了其他乐器的网络，则到达了其他乐器伴奏的音频数据集的更好的测试结果。

We study the ability of Wasserstein Generative Adversarial Network (WGAN) to generate missing audio content which is, in context, (statistically similar) to the sound and the neighboring borders. We deal with the challenge of audio inpainting long range gaps (500 ms) using WGAN models. We improved the quality of the inpainting part using a new proposed WGAN architecture that uses a short-range and a long-range neighboring borders compared to the classical WGAN model. The performance was compared with two different audio instruments (piano and guitar) and on virtuoso pianists together with a string orchestra. The objective difference grading (ODG) was used to evaluate the performance of both architectures. The proposed model outperforms the classical WGAN model and improves the reconstruction of high-frequency content. Further, we got better results for instruments where the frequency spectrum is mainly in the lower range where small noises are less annoying for human ear and the inpainting part is more perceptible. Finally, we could show that better test results for audio dataset were reached where a particular instrument is accompanist by other instruments if we train the network only on this particular instrument neglecting the other instruments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题