CAT：基于CTC-CRF的ASR工具包桥接混合动力和端到端的数据效率和低潜伏期的方法

论文标题

CAT：基于CTC-CRF的ASR工具包桥接混合动力和端到端的数据效率和低潜伏期的方法

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

论文作者

An, Keyu, Xiang, Hongyu, Ou, Zhijian

论文摘要

在本文中，我们提出了一个新的用于语音识别的开源工具包，名为CAT（基于CTC-CRF的ASR工具包）。 CAT继承了混合方法的数据效率和E2E方法的简单性，提供了CTC-CRF的全面实现，并为许多英语和中文基准进行了完整的培训和测试脚本。实验表明CAT获得了最先进的结果，这些结果与Kaldi中的微型混合模型相当，但具有更简单的训练管道。与现有的非模块化E2E模型相比，CAT在有限规模的数据集上的性能更好，证明其数据效率。此外，我们提出了一种称为上下文化软遗忘的新方法，该方法使Cat能够在不准确降解的情况下进行流媒体ASR。我们希望CAT，尤其是基于CTC-CRF的框架和软件，将对社区感兴趣，并可以进一步探索和改进。

In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题