卷积复发神经网络的复音音高检测

论文标题

卷积复发神经网络的复音音高检测

Polyphonic pitch detection with convolutional recurrent neural networks

论文作者

Thomé, Carl, Ahlbäck, Sven

论文摘要

自动语音识别（ASR）研究的最新指示表明，在计算机视觉中应用深度学习模型是有益的。由于自动音乐转录（AMT）在表面上与ASR相似，因此方法通常依赖于将频谱图转换为事件符号序列（例如单词或注释），因此深度学习也应受益于AMT。在这项工作中，我们概述了一个在线复合音调检测系统，该系统将音频播放到Convlstms的MIDI。我们的系统在2007年Mirex Multi-F0开发集中取得了最新的结果，在低音，单簧管，长笛，角和双簧管合奏录制的情况下，F-Measion的F量为83 \％，而无需任何乐器的音色仪表建模或假设。

Recent directions in automatic speech recognition (ASR) research have shown that applying deep learning models from image recognition challenges in computer vision is beneficial. As automatic music transcription (AMT) is superficially similar to ASR, in the sense that methods often rely on transforming spectrograms to symbolic sequences of events (e.g. words or notes), deep learning should benefit AMT as well. In this work, we outline an online polyphonic pitch detection system that streams audio to MIDI by ConvLSTMs. Our system achieves state-of-the-art results on the 2007 MIREX multi-F0 development set, with an F-measure of 83\% on the bassoon, clarinet, flute, horn and oboe ensemble recording without requiring any musical language modelling or assumptions of instrument timbre.

下载PDF全文

下载文献需遵守相关版权规定

论文标题