论文标题

语音增强和噪音感知的网络,可识别强大的语音识别

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

论文作者

Lee, Hung-Shin, Chen, Pin-Yuan, Cheng, Yao-Fei, Tsao, Yu, Wang, Hsin-Min

论文摘要

通道不匹配和噪声干扰的补偿对于强大的自动语音识别至关重要。加强的语音已引入了声学模型的多条件培训中,以提高其概括能力。在本文中,提出了一个基于两个级联神经结构的噪音感知训练框架,以共同优化语音增强和语音识别。功能增强模块由多任务自动编码器组成,嘈杂的语音被分解为干净的语音和噪声。通过将其增强的,噪音吸引和嘈杂的特征连接起来,通过优化预测状态序列和实际状态序列之间的无晶格最大共同信息和交叉熵,通过优化无晶格的最大共同信息和交叉熵,声学模块将每个特征仪的框架映射到Triphone状态。除了分解的时间延迟神经网络(TDNN-F)及其卷积变体(CNN-TDNNF)外,两种拟议的系统在Aurora-4的任务上分别达到3.90%和3.55%的单词错误率(WER)。与使用Bigram和Trigram语言模型进行解码的最佳现有系统相比,拟议的基于CNN-TDNNF的系统的相对降低分别为15.20%和33.53%。此外,提出的基于CNN-TDNNF的系统还优于AMI任务上的基线CNN-TDNNF系统。

Compensation for channel mismatch and noise interference is essential for robust automatic speech recognition. Enhanced speech has been introduced into the multi-condition training of acoustic models to improve their generalization ability. In this paper, a noise-aware training framework based on two cascaded neural structures is proposed to jointly optimize speech enhancement and speech recognition. The feature enhancement module is composed of a multi-task autoencoder, where noisy speech is decomposed into clean speech and noise. By concatenating its enhanced, noise-aware, and noisy features for each frame, the acoustic-modeling module maps each feature-augmented frame into a triphone state by optimizing the lattice-free maximum mutual information and cross entropy between the predicted and actual state sequences. On top of the factorized time delay neural network (TDNN-F) and its convolutional variant (CNN-TDNNF), both with SpecAug, the two proposed systems achieve word error rate (WER) of 3.90% and 3.55%, respectively, on the Aurora-4 task. Compared with the best existing systems that use bigram and trigram language models for decoding, the proposed CNN-TDNNF-based system achieves a relative WER reduction of 15.20% and 33.53%, respectively. In addition, the proposed CNN-TDNNF-based system also outperforms the baseline CNN-TDNNF system on the AMI task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源