DDX7：乐器声音的可区分FM合成

论文标题

DDX7：乐器声音的可区分FM合成

DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

论文作者

Caspe, Franco, McPherson, Andrew, Sandler, Mark

论文摘要

FM合成是一种众所周知的算法，用于从一组紧凑的设计原始素中生成复杂的音色。通常具有MIDI接口，通常从音频源进行控制是不切实际的。另一方面，可区分的数字信号处理（DDSP）启用了深层神经网络（DNN）的细微音频渲染，这些音频渲染学会从任意声音输入中控制可区分的合成层。训练过程涉及音频语料库进行监督和光谱重建损失功能。这种功能虽然非常适合匹配光谱振幅，但却存在缺乏俯仰方向，这可能会阻碍FM合成器参数的关节优化。在本文中，我们采取了步骤，从而可以从音频输入中连续控制良好的FM合成体系结构。首先，我们讨论了一组设计约束，通过标准重建损耗来简化可区分的FM合成器的光谱优化。接下来，我们提出可区分的DX7（DDX7），这是一种简短的乐器声音神经FM重新构造，从一组紧凑的参数来看。我们对从URMP数据集提取的仪器样品进行训练，并定量证明其与所选基准测试的相当音频质量。

FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers from arbitrary sound inputs. The training process involves a corpus of audio for supervision, and spectral reconstruction loss functions. Such functions, while being great to match spectral amplitudes, present a lack of pitch direction which can hinder the joint optimization of the parameters of FM synthesizers. In this paper, we take steps towards enabling continuous control of a well-established FM synthesis architecture from an audio input. Firstly, we discuss a set of design constraints that ease spectral optimization of a differentiable FM synthesizer via a standard reconstruction loss. Next, we present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds in terms of a compact set of parameters. We train the model on instrument samples extracted from the URMP dataset, and quantitatively demonstrate its comparable audio quality against selected benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题