LightSeq：变压器的高性能推理库

论文标题

LightSeq：变压器的高性能推理库

LightSeq: A High Performance Inference Library for Transformers

论文作者

Wang, Xiaohui, Xiong, Ying, Wei, Yang, Wang, Mingxuan, Li, Lei

论文摘要

变压器，伯特及其变体在自然语言处理方面取得了巨大的成功。由于变压器模型的尺寸很大，因此为这些模型服务是实际工业应用的挑战。在本文中，我们提出了LightSeq，这是变压器家族中模型的高效推理库。 LightSeq包括一系列的GPU优化技术，以简化神经层的计算并减少记忆足迹。 LightSeq可以轻松地进口使用Pytorch和Tensorflow训练的模型。机器翻译基准测试的实验结果表明，与张力量和1.4倍相比，LightSeq的速度高达14倍，而Forstransformer（同时发生的CUDA实现）相比。该代码可在https://github.com/bytedance/lightseq上找到。

Transformer, BERT and their variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose LightSeq, a highly efficient inference library for models in the Transformer family. LightSeq includes a series of GPU optimization techniques to to streamline the computation of neural layers and to reduce memory footprint. LightSeq can easily import models trained using PyTorch and Tensorflow. Experimental results on machine translation benchmarks show that LightSeq achieves up to 14x speedup compared with TensorFlow and 1.4x compared with FasterTransformer, a concurrent CUDA implementation. The code is available at https://github.com/bytedance/lightseq.

下载PDF全文

下载文献需遵守相关版权规定

论文标题