计算成本摊销变压器用于流媒体

论文标题

计算成本摊销变压器用于流媒体

Compute Cost Amortized Transformer for Streaming ASR

论文作者

Xie, Yi, Macoskey, Jonathan, Radfar, Martin, Chang, Feng-Ju, King, Brian, Rastrow, Ariya, Mouchtaris, Athanasios, Strimel, Grant P.

论文摘要

我们提供了基于流的，基于变压器的端到端自动语音识别（ASR）体系结构，该体系结构通过计算成本摊销来实现有效的神经推断。我们的架构在推理时间动态地创建稀疏的计算途径，从而在整个解码过程中选择性使用计算资源，从而使计算中的大幅降低，对准确性的影响最小。完全可区分的体系结构是端到端训练的，其随附的轻量级仲裁器机制在帧层面上运行，以在每个输入上做出动态决策，同时使用可调损耗函数来正规化针对预测性能的整体计算水平。我们使用在LiblisPeech数据上进行的计算摊销变压器变形器（T-T）模型报告了实验的经验结果。我们的最佳模型可以实现60％的计算成本降低，而相对单词错误率仅3％（WER）增加。

We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization. Our architecture creates sparse computation pathways dynamically at inference time, resulting in selective use of compute resources throughout decoding, enabling significant reductions in compute with minimal impact on accuracy. The fully differentiable architecture is trained end-to-end with an accompanying lightweight arbitrator mechanism operating at the frame-level to make dynamic decisions on each input while a tunable loss function is used to regularize the overall level of compute against predictive performance. We report empirical results from experiments using the compute amortized Transformer-Transducer (T-T) model conducted on LibriSpeech data. Our best model can achieve a 60% compute cost reduction with only a 3% relative word error rate (WER) increase.

下载PDF全文

下载文献需遵守相关版权规定

论文标题