论文标题

CAT:基于CTC-CRF的ASR工具包桥接混合动力和端到端的数据效率和低潜伏期的方法

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

论文作者

An, Keyu, Xiang, Hongyu, Ou, Zhijian

论文摘要

在本文中,我们提出了一个新的用于语音识别的开源工具包,名为CAT(基于CTC-CRF的ASR工具包)。 CAT继承了混合方法的数据效率和E2E方法的简单性,提供了CTC-CRF的全面实现,并为许多英语和中文基准进行了完整的培训和测试脚本。实验表明CAT获得了最先进的结果,这些结果与Kaldi中的微型混合模型相当,但具有更简单的训练管道。与现有的非模块化E2E模型相比,CAT在有限规模的数据集上的性能更好,证明其数据效率。此外,我们提出了一种称为上下文化软遗忘的新方法,该方法使Cat能够在不准确降解的情况下进行流媒体ASR。我们希望CAT,尤其是基于CTC-CRF的框架和软件,将对社区感兴趣,并可以进一步探索和改进。

In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源