论文标题

混合语音识别系统中词汇扩展的技术

Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

论文作者

Malkovsky, Nikolay, Bataev, Vladimir, Sviridkin, Dmitrii, Kizhaeva, Natalia, Laptev, Aleksandr, Valiev, Ildar, Petrov, Oleg

论文摘要

对于任何语音识别系统来说,词汇单词(OOV)的问题通常是典型的,通常会构建混合系统以识别固定的单词,很少包含在剥削系统剥削过程中会遇到的所有单词。覆盖OOV的流行方法之一是使用子词单元而不是单词。如果可以从当前的子词单元构造该单词,则该系统可能会识别任何以前看不见的单词,但也可以识别不存在的单词。另一种流行的方法是修改系统的HMM一部分,以便可以使用我们要添加到系统中的自定义单词来轻松有效地扩展。在本文中,我们在图形构造和搜索方法级别上探讨了该解决方案的不同现有方法。我们还提出了一种新颖的词汇膨胀技术,该技术解决了有关识别图处理的一些常见内部子例程问题。

The problem of out of vocabulary words (OOV) is typical for any speech recognition system, hybrid systems are usually constructed to recognize a fixed set of words and rarely can include all the words that will be encountered during exploitation of the system. One of the popular approach to cover OOVs is to use subword units rather then words. Such system can potentially recognize any previously unseen word if the word can be constructed from present subword units, but also non-existing words can be recognized. The other popular approach is to modify HMM part of the system so that it can be easily and effectively expanded with custom set of words we want to add to the system. In this paper we explore different existing methods of this solution on both graph construction and search method levels. We also present a novel vocabulary expansion techniques which solve some common internal subroutine problems regarding recognition graph processing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源