论文标题
神经架构搜索LF-MMI训练的时间延迟神经网络
Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks
论文作者
论文摘要
基于深度神经网络(DNNS)的自动语音识别(ASR)系统通常是使用专家知识和经验评估设计的。在本文中,一系列神经体系结构搜索(NAS)技术用于自动学习最先进的时间延迟神经网络(TDNNS)的两种类型的超参数:i)左右剪接上下文上下文偏移; ii)瓶颈线性投影在每个隐藏层的尺寸。其中包括将体系结构选择与无晶格MMI(LF-MMI)TDNN培训集成的飞镖方法; Gumbel-Softmax和管道的飞镖减少了候选体系结构的混乱,并改善了体系结构选择的概括;并进行惩罚的飞镖,其中包括资源限制,以调整性能和系统复杂性之间的权衡。候选体系结构之间的参数共享可以有效地搜索$ 7^{28} $不同的TDNN系统。在300小时的总机语料库上进行的实验表明,使用手动网络设计或LHUC扬声器改编后,使用手动网络设计或随机体系结构搜索自动配置的系统始终优于基线LF-MMI TDNN系统。绝对单词错误率(WER)降低最高1.0 \%,并且获得了28 \%的相对模型尺寸降低。使用拟议的NAS方法,在UASPEECHIDERDENS识别任务上还获得了一致的性能改进。
Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper-parameters of state-of-the-art factored time delay neural networks (TDNNs): i) the left and right splicing context offsets; and ii) the dimensionality of the bottleneck linear projection at each hidden layer. These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training; Gumbel-Softmax and pipelined DARTS reducing the confusion over candidate architectures and improving the generalization of architecture selection; and Penalized DARTS incorporating resource constraints to adjust the trade-off between performance and system complexity. Parameter sharing among candidate architectures allows efficient search over up to $7^{28}$ different TDNN systems. Experiments conducted on the 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems using manual network design or random architecture search after LHUC speaker adaptation and RNNLM rescoring. Absolute word error rate (WER) reductions up to 1.0\% and relative model size reduction of 28\% were obtained. Consistent performance improvements were also obtained on a UASpeech disordered speech recognition task using the proposed NAS approaches.