通过大规模的弱监督识别强大的语音识别

论文标题

通过大规模的弱监督识别强大的语音识别

Robust Speech Recognition via Large-Scale Weak Supervision

论文作者

Radford, Alec, Kim, Jong Wook, Xu, Tao, Brockman, Greg, McLeavey, Christine, Sutskever, Ilya

论文摘要

我们研究了经过培训的语音处理系统的功能，只是为了预测互联网上的大量音频。当缩放到680,000小时的多语言和多任务监督时，最终的模型可以很好地推广到标准的基准测试，并且通常具有先前的完全监督的结果，但在零摄像机的转移设置中，而无需进行任何微调。与人类相比，模型的准确性和鲁棒性。我们正在发布模型和推理代码，以作为在强大语音处理上进一步工作的基础。

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题