论文标题

学习的可转移架构可以超越手工设计的体系结构,以识别大规模的语音识别

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

论文作者

He, Liqiang, Su, Dan, Yu, Dong

论文摘要

在本文中,我们探讨了自动语音识别(ASR)系统的神经体系结构搜索(NAS)。关于计算机视觉字段中先前的作品,搜索架构的可传递性是我们作品的主要重点。在小型代理数据集上进行了体系结构搜索,然后在大型数据集中评估了使用搜索体系结构构建的评估网络。尤其是,我们为语音识别任务提出了一个修订的搜索空间,从理论上讲,该任务促进了搜索算法以探索较低复杂性的体系结构。广泛的实验表明:(i)可以将小型代理数据集中搜索的体系结构转移到大数据集中以进行语音识别任务。 (ii)在修订后的搜索空间中学到的架构可以大大减少具有轻度性能降解的计算开销和GPU内存使用情况。 (iii)与我们最好的手工设计的DFSMN-SAN体系结构相比,搜索的架构可以在Aishell-2数据集和大型(10K小时)数据集上分别获得超过20%和15%(在四个测试集的平均值)相对改进。据我们所知,这是大规模数据集(长达10k小时)的NAS结果的第一份报告,表明NAS在工业ASR系统中有希望地应用。

In this paper, we explore the neural architecture search (NAS) for automatic speech recognition (ASR) systems. With reference to the previous works in the computer vision field, the transferability of the searched architecture is the main focus of our work. The architecture search is conducted on the small proxy dataset, and then the evaluation network, constructed with the searched architecture, is evaluated on the large dataset. Especially, we propose a revised search space for speech recognition tasks which theoretically facilitates the search algorithm to explore the architectures with low complexity. Extensive experiments show that: (i) the architecture searched on the small proxy dataset can be transferred to the large dataset for the speech recognition tasks. (ii) the architecture learned in the revised search space can greatly reduce the computational overhead and GPU memory usage with mild performance degradation. (iii) the searched architecture can achieve more than 20% and 15% (average on the four test sets) relative improvements respectively on the AISHELL-2 dataset and the large (10k hours) dataset, compared with our best hand-designed DFSMN-SAN architecture. To the best of our knowledge, this is the first report of NAS results with large scale dataset (up to 10K hours), indicating the promising application of NAS to industrial ASR systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源