论文标题
通过频谱比较与音量无关的音乐匹配
Volume-Independent Music Matching by Frequency Spectrum Comparison
论文作者
论文摘要
通常,我听到一段音乐,想知道作品的名字是什么。确实,有一些应用程序,例如Shazam应用程序可以提供音乐匹配。但是,这些应用程序的局限性是,如果同一音乐家的录制不是同一音乐家,则无法识别同一音乐。 Shazam标识了它的录制,而不是音乐。这是因为Shazam匹配音量的变化,而不是声音的频率。这项研究试图以人类理解的方式匹配音乐:按音乐的频率而不是音量变化。从本质上讲,这个想法是要预先计算数据库中所有音乐的频谱,然后拿起未知的部分,并尝试将其频谱与数据库中每个音乐的每个段相匹配。我通过将窗口滑动0.1秒并通过采用绝对值,标准化音频,减去归一化阵列并获取绝对差异之和来计算误差来使我做到这一点,并通过将窗口滑动0.1秒并计算误差来实现。显示最小误差的段被认为是比赛的候选人。事实证明,匹配性能取决于音乐的复杂性。匹配简单的音乐,例如单音调作品,都是成功的。但是,更复杂的作品,例如肖邦·巴拉德(Chopins Ballade)4,并不成功,也就是说,该算法在数据库中的任何音乐中都无法产生低误差值。我怀疑这与拥有太多的注释有关:较高谐波中的不匹配加起来,这会淹没计算。
Often, I hear a piece of music and wonder what the name of the piece is. Indeed, there are applications such as Shazam app that provides music matching. However, the limitations of those apps are that the same piece performed by the same musician cannot be identified if it is not the same recording. Shazam identifies the recording of it, not the music. This is because Shazam matches the variation in volume, not the frequencies of the sound. This research attempts to match music the way humans understand it: by the frequency spectrum of music, not the volume variation. Essentially, the idea is to precompute the frequency spectrums of all the music in the database, then take the unknown piece and try to match its frequency spectrum against every segment of every music in the database. I did it by matching the frequency spectrum of the unknown piece to our database by sliding the window by 0.1 seconds and calculating the error by taking Absolute value, normalizing the audio, subtracting the normalized arrays, and taking the sum of absolute differences. The segment that shows the least error is considered the candidate for the match. The matching performance proved to be dependent on the complexity of the music. Matching simple music, such as single note pieces, was successful. However, more complex pieces, such as Chopins Ballade 4, were not successful, that is, the algorithm could not produce low error values in any of the music in the database. I suspect that it has to do with having too many notes: mismatches in the higher harmonics added up to a significant amount of errors, which swamps the calculations.