Kospeech：端到端韩国语音识别的开源工具包

论文标题

Kospeech：端到端韩国语音识别的开源工具包

KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition

论文作者

Kim, Soohwan, Bae, Seyoung, Won, Cheolhwang

论文摘要

我们提出了Kospeech，这是一种开源软件，它是基于深度学习库Pytorch的模块化且可扩展的端到端韩国自动语音识别（ASR）工具包。已经发布了几种自动语音识别开源工具包，但是所有这些工具包都涉及非korean语言，例如英语（例如ESPNET，ESPRESSO）。尽管AI HUB开放了1,000个小时的韩国语音语料库，称为KsponSpeech，但没有建立的预处理方法和基线模型来比较模型性能。因此，我们提出了Ksponspeech语料库的预处理方法和基准模型的基准模型。我们的基线模型基于聆听，参加和咒语（LAS）体系结构以及方便地自定义各种培训超标仪的ABLE。通过Kospeech，我们希望这对于那些研究韩国言语认可的人来说可能是一个指导方针。我们的基线模型仅使用声学模型在Ksponspeech语料库上实现了10.31％的字符错误率（CER）。我们的源代码可在此处提供。

We present KoSpeech, an open-source software, which is modular and extensible end-to-end Korean automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch. Several automatic speech recognition open-source toolkits have been released, but all of them deal with non-Korean languages, such as English (e.g. ESPnet, Espresso). Although AI Hub opened 1,000 hours of Korean speech corpus known as KsponSpeech, there is no established preprocessing method and baseline model to compare model performances. Therefore, we propose preprocessing methods for KsponSpeech corpus and a baseline model for benchmarks. Our baseline model is based on Listen, Attend and Spell (LAS) architecture and ables to customize various training hyperparameters conveniently. By KoSpeech, we hope this could be a guideline for those who research Korean speech recognition. Our baseline model achieved 10.31% character error rate (CER) at KsponSpeech corpus only with the acoustic model. Our source code is available here.

下载PDF全文

下载文献需遵守相关版权规定

论文标题