论文标题
电力集合操作员和半监督声音事件检测的信心学习
Power Pooling Operators and Confidence Learning for Semi-Supervised Sound Event Detection
论文作者
论文摘要
近年来,合成标记的数据,弱标记的数据和未标记的数据的参与吸引了大量的研究注意,半监督声音事件检测(SSED)。自我训练模型进行预测而没有强有力的注释,然后采取具有较高概率作为伪标签的预测。这样的模型在SSED中显示出其有效性。但是,概率是校准较差的置信度估计值,概率低的样品被忽略。因此,我们介绍了一种有意学习信心的方法,并通过将信心作为权重来明确保留所有数据。此外,线性池被认为是具有弱标记的SSS的最新聚合函数。在本文中,我们提出了一个功率池功能,可以自动训练该系数以实现非线性。基于信心的半监督声音事件检测(C-SSED)框架旨在结合置信度和功率池。实验结果表明,置信度与预测的准确性成正比。功率池函数以错误率和F1结果优于线性合并。另外,与基线模型相比,C-SSED框架的相对错误率降低了34%。
In recent years, the involvement of synthetic strongly labeled data,weakly labeled data and unlabeled data has drawn much research attentionin semi-supervised sound event detection (SSED). Self-training models carry out predictions without strong annotations and then take predictions with high probabilities as pseudo-labels for retraining. Such models have shown its effectiveness in SSED. However, probabilities are poorly calibrated confidence estimates, and samples with low probabilities are ignored. Hence, we introduce a method of learning confidence deliberately and retaining all data distinctly by applying confidence as weights. Additionally, linear pooling has been considered as a state-of-the-art aggregation function for SSED with weak labeling. In this paper, we propose a power pooling function whose coefficient can be trained automatically to achieve nonlinearity. A confidencebased semi-supervised sound event detection (C-SSED) framework is designed to combine confidence and power pooling. The experimental results demonstrate that confidence is proportional to the accuracy of the predictions. The power pooling function outperforms linear pooling at both error rate and F1 results. In addition, the C-SSED framework achieves a relative error rate reduction of 34% in contrast to the baseline model.