论文标题

HPPNET:在钢琴转录中建模谐波结构和音高不变性

HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

论文作者

Wei, Weixing, Li, Peilin, Yu, Yi, Li, Wei

论文摘要

尽管神经网络模型在钢琴转录方面取得了重大进展,但由于需要更大的模型大小和更多的计算能力,因此它们变得越来越大。在本文中,我们试图对钢琴进行更多先验,以减少模型大小并改善转录性能。钢琴音符的声音包含各种泛音,钥匙的音高不会随着时间而变化。为了充分利用此类潜在信息,我们建议HPPNET使用谐波扩张的卷积捕获谐波结构和频率分组的复发性神经网络,以随着时间的推移对音高不变性进行建模。大师数据集的实验结果表明,我们的钢琴转录系统在框架和音符分数中都达到了最先进的性能(Frame F1 93.15%,Note F1 97.18%)。此外,模型大小要比以前最先进的深度学习模型小得多。

While neural network models are making significant progress in piano transcription, they are becoming more resource-consuming due to requiring larger model size and more computing power. In this paper, we attempt to apply more prior about piano to reduce model size and improve the transcription performance. The sound of a piano note contains various overtones, and the pitch of a key does not change over time. To make full use of such latent information, we propose HPPNet that using the Harmonic Dilated Convolution to capture the harmonic structures and the Frequency Grouped Recurrent Neural Network to model the pitch-invariance over time. Experimental results on the MAESTRO dataset show that our piano transcription system achieves state-of-the-art performance both in frame and note scores (frame F1 93.15%, note F1 97.18%). Moreover, the model size is much smaller than the previous state-of-the-art deep learning models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源