PP-OCRV3：改进超轻量OCR系统的更多尝试

论文标题

PP-OCRV3：改进超轻量OCR系统的更多尝试

PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System

论文作者

Li, Chenxia, Liu, Weiwei, Guo, Ruoyu, Yin, Xiaoting, Jiang, Kaitao, Du, Yongkun, Du, Yuning, Zhu, Lingfeng, Lai, Baohua, Hu, Xiaoguang, Yu, Dianhai, Ma, Yanjun

论文摘要

如图1所示，光学特征识别（OCR）技术已被广泛用于各种场景。设计实用的OCR系统仍然是一项有意义但具有挑战性的任务。在以前的工作中，考虑到效率和准确性，我们提出了实用的超轻型OCR系统（PP-OR）和优化的版本PP-OCRV2。为了进一步提高PP-OCRV2的性能，本文提出了更强大的OCR系统PP-OCRV3。 PP-OCRV3基于PP-OCRV2在9个方面升级了文本检测模型和文本识别模型。对于文本检测器，我们引入了一个带有大型接收场LK-PAN的PAN模块，该模块是一个名为RSE-FPN的剩余注意机制的FPN模块和DML蒸馏策略。对于文本识别器，将基本模型从CRNN替换为SVTR，我们介绍了轻巧的文本识别网络SVTR LCNET，通过注意，对CTC进行指导培训，数据增强策略TextConaug，通过自我使用的Textrotnet，UIM，UIM和UIM和UIM和UIM进行更好的预训练模型，以加速模型和改善效果。实际数据上的实验表明，在可比的推理速度下，PP-OCRV3的Hmean比PP-OCRV2高5％。上述所有上述型号都是开源的，并且代码可在由PaddlePaddle提供动力的GitHub存储库Paddleocr中。

Optical character recognition (OCR) technology has been widely used in various scenes, as shown in Figure 1. Designing a practical OCR system is still a meaningful but challenging task. In previous work, considering the efficiency and accuracy, we proposed a practical ultra lightweight OCR system (PP-OCR), and an optimized version PP-OCRv2. In order to further improve the performance of PP-OCRv2, a more robust OCR system PP-OCRv3 is proposed in this paper. PP-OCRv3 upgrades the text detection model and text recognition model in 9 aspects based on PP-OCRv2. For text detector, we introduce a PAN module with large receptive field named LK-PAN, a FPN module with residual attention mechanism named RSE-FPN, and DML distillation strategy. For text recognizer, the base model is replaced from CRNN to SVTR, and we introduce lightweight text recognition network SVTR LCNet, guided training of CTC by attention, data augmentation strategy TextConAug, better pre-trained model by self-supervised TextRotNet, UDML, and UIM to accelerate the model and improve the effect. Experiments on real data show that the hmean of PP-OCRv3 is 5% higher than PP-OCRv2 under comparable inference speed. All the above mentioned models are open-sourced and the code is available in the GitHub repository PaddleOCR which is powered by PaddlePaddle.

下载PDF全文

下载文献需遵守相关版权规定

论文标题