论文标题
DDSP:可区分的数字信号处理
DDSP: Differentiable Digital Signal Processing
论文作者
论文摘要
音频的大多数生成模型直接在两个域之一中生成样品:时间或频率。尽管足以表达任何信号,但这些表示效率降低,因为它们不利用对声音的产生和感知的现有知识。第三种方法(Vocoders/Synthesizers)成功地结合了信号处理和感知的强大领域知识,但由于表现力有限和与现代自动差异化的机器学习方法集成有限,因此不太积极研究。在本文中,我们介绍了可区分的数字信号处理(DDSP)库,该库可以将经典信号处理元素与深度学习方法直接集成。专注于音频综合,我们在不需要大型自回归模型或对抗性损失的情况下实现了高保真的产生,这表明DDSP可以利用强大的电感偏见而不会失去神经网络的表现力。此外,我们表明,将可解释的模块结合起来可以操纵每个单独的模型组件,以及诸如对音高和响度的独立控制,逼真的外推到训练期间未见的俯仰,盲目的房间声学的盲目验证,将提取的房间声学转移到新的环境中,以及在新环境中转移到偏见源之间。简而言之,DDSP可以采用一种可解释的模块化方法来生成建模,而无需牺牲深度学习的好处。该图书馆可在https://github.com/magenta/ddsp上公开获取,我们欢迎社区和领域专家的进一步贡献。
Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is generated and perceived. A third approach (vocoders/synthesizers) successfully incorporates strong domain knowledge of signal processing and perception, but has been less actively researched due to limited expressivity and difficulty integrating with modern auto-differentiation-based machine learning methods. In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. Focusing on audio synthesis, we achieve high-fidelity generation without the need for large autoregressive models or adversarial losses, demonstrating that DDSP enables utilizing strong inductive biases without losing the expressive power of neural networks. Further, we show that combining interpretable modules permits manipulation of each separate model component, with applications such as independent control of pitch and loudness, realistic extrapolation to pitches not seen during training, blind dereverberation of room acoustics, transfer of extracted room acoustics to new environments, and transformation of timbre between disparate sources. In short, DDSP enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning. The library is publicly available at https://github.com/magenta/ddsp and we welcome further contributions from the community and domain experts.