数据增强对比度学习语音表示形式在时域中

论文标题

数据增强对比度学习语音表示形式在时域中

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

论文作者

Kharitonov, Eugene, Rivière, Morgane, Synnaeve, Gabriel, Wolf, Lior, Mazaré, Pierre-Emmanuel, Douze, Matthijs, Dupoux, Emmanuel

论文摘要

基于过去细分市场预测语音的未来段的对比预测编码（CPC）正在成为一种强大的算法，用于表示语音信号学习。但是，它仍然不足以对无监督评估基准的其他方法不足。在这里，我们介绍了一个时间域数据增强库的Wavaugment，并发现过去应用增强通常比其他方法更有效，并且产生更好的性能。我们发现，音高修改，加性噪声和混响的结合大大提高了CPC的性能（相对改善为18-22％），击败了参考库libli-Light结果，数据少600倍。使用跨域数据集，时间域数据的扩展可以使CPC与2017年零语音基准的最新状态相提并论。我们还表明，时间域数据的增加始终可以提高下游有限的音调音素分类任务，而相对相对12-15％。

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative.

下载PDF全文

下载文献需遵守相关版权规定

论文标题