论文标题

数据增强对比度学习语音表示形式在时域中

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

论文作者

Kharitonov, Eugene, Rivière, Morgane, Synnaeve, Gabriel, Wolf, Lior, Mazaré, Pierre-Emmanuel, Douze, Matthijs, Dupoux, Emmanuel

论文摘要

基于过去细分市场预测语音的未来段的对比预测编码(CPC)正在成为一种强大的算法,用于表示语音信号学习。但是,它仍然不足以对无监督评估基准的其他方法不足。在这里,我们介绍了一个时间域数据增强库的Wavaugment,并发现过去应用增强通常比其他方法更有效,并且产生更好的性能。我们发现,音高修改,加性噪声和混响的结合大大提高了CPC的性能(相对改善为18-22%),击败了参考库libli-Light结果,数据少600倍。使用跨域数据集,时间域数据的扩展可以使CPC与2017年零语音基准的最新状态相提并论。我们还表明,时间域数据的增加始终可以提高下游有限的音调音素分类任务,而相对相对12-15%。

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源