论文标题
基于回归的音乐情感预测使用三胞胎神经网络
Regression-based music emotion prediction using triplet neural networks
论文作者
论文摘要
在本文中,我们将三胞胎神经网络(TNN)调整为回归任务,音乐情感预测。由于最初引入了TNN进行分类,而不是用于回归,因此我们提出了一种机制,使他们能够为回归任务提供有意义的低维表示。然后,我们将这些新表示形式用作回归算法的输入,例如支持向量机和梯度提升机。为了证明TNNS在创建有意义的表示方面的有效性,我们将它们与音乐情感预测的不同维度减少方法进行了比较,即从音乐音频信号中预测价值和唤醒价值。我们在DEAM数据集上的结果表明,通过使用TNN,我们达到了90%的特征维度降低,价值预测提高了9%,相对于我们的基线模型(没有TNN),唤醒预测的提高了4%。我们的TNN方法的表现优于其他维度降低方法,例如主组件分析(PCA)和自动编码器(AE)。这表明,除了提供音频功能的紧凑的潜在空间表示外,所提出的方法还具有比基线模型更高的性能。
In this paper, we adapt triplet neural networks (TNNs) to a regression task, music emotion prediction. Since TNNs were initially introduced for classification, and not for regression, we propose a mechanism that allows them to provide meaningful low dimensional representations for regression tasks. We then use these new representations as the input for regression algorithms such as support vector machines and gradient boosting machines. To demonstrate the TNNs' effectiveness at creating meaningful representations, we compare them to different dimensionality reduction methods on music emotion prediction, i.e., predicting valence and arousal values from musical audio signals. Our results on the DEAM dataset show that by using TNNs we achieve 90% feature dimensionality reduction with a 9% improvement in valence prediction and 4% improvement in arousal prediction with respect to our baseline models (without TNN). Our TNN method outperforms other dimensionality reduction methods such as principal component analysis (PCA) and autoencoders (AE). This shows that, in addition to providing a compact latent space representation of audio features, the proposed approach has a higher performance than the baseline models.