无监督的不确定性度量无侵入性语音可理解性预测的自动语音识别

论文标题

无监督的不确定性度量无侵入性语音可理解性预测的自动语音识别

Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

论文作者

Tu, Zehai, Ma, Ning, Barker, Jon

论文摘要

非侵入性的可理解性预测对于在现实情况下的应用很重要，在现实情况下，很难访问干净的参考信号。许多非侵入性预测变量的构建都需要地面真理可理解性标签或干净的参考信号，以进行监督学习。在这项工作中，我们利用一种无监督的不确定性估计方法来预测语音可理解性，该方法不需要可理解性标签或参考信号来训练预测变量。我们的实验表明，最新的端到端自动语音识别（ASR）模型的不确定性与语音清晰度高度相关。在两个数据库上评估了所提出的方法，结果表明，与广泛使用的侵入性方法相比，ASR模型的无监督不确定性度量与聆听结果的语音清晰度更相关。

Non-intrusive intelligibility prediction is important for its application in realistic scenarios, where a clean reference signal is difficult to access. The construction of many non-intrusive predictors require either ground truth intelligibility labels or clean reference signals for supervised learning. In this work, we leverage an unsupervised uncertainty estimation method for predicting speech intelligibility, which does not require intelligibility labels or reference signals to train the predictor. Our experiments demonstrate that the uncertainty from state-of-the-art end-to-end automatic speech recognition (ASR) models is highly correlated with speech intelligibility. The proposed method is evaluated on two databases and the results show that the unsupervised uncertainty measures of ASR models are more correlated with speech intelligibility from listening results than the predictions made by widely used intrusive methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题