使用自行车和域间损失改善半监督的端到端自动语音识别

论文标题

使用自行车和域间损失改善半监督的端到端自动语音识别

Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses

论文作者

Li, Chia-Yu, Vu, Ngoc Thang

论文摘要

我们提出了一种新颖的方法，该方法结合了半监督端到端自动语音识别的自行车和域间损失。域间损失针对使用共享网络的中间人共享语音和文本输入的共享表示。 Cyclegan使用循环一致的损失和身份映射损失，以保留从一个域转换为另一个域后的输入特征的相关特征。因此，这两种方法均适用于未配对的语音文本输入上的端到端模型。在本文中，我们利用了域间损失和自行车的优势来获得更好的语音和文本输入的共享表示，从而改善了语音到文本映射。我们在WSJ eval92和Voxforge（非英语）上的实验结果显示，基线的字符错误率降低了8〜8.5％，并且在Librispeech test_clean上的结果也显示出明显的改进。

We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycle-consistent loss and the identity mapping loss to preserve relevant characteristics of the input feature after converting from one domain to another. As such, both approaches are suitable to train end-to-end models on unpaired speech-text inputs. In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text mapping. Our experimental results on the WSJ eval92 and Voxforge (non English) show 8~8.5% character error rate reduction over the baseline, and the results on LibriSpeech test_clean also show noticeable improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题