ECCV 2022挑战的亚军解决方案在词汇场景中挑战文本理解：裁剪单词识别

论文标题

ECCV 2022挑战的亚军解决方案在词汇场景中挑战文本理解：裁剪单词识别

Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

论文作者

Zhu, Zhangzi, Hao, Yu, Zhang, Wenqing, Xue, Chuhui, Bai, Song

论文摘要

该报告介绍了我们对eccv 2022挑战的第二名解决方案，挑战了播放视频的文本理解（OOV-ST）：裁剪单词识别。这项挑战是在ECCV 2022关于所有内容（TIE）中的文本讲习班的背景下进行的，该研讨会（TIE）旨在从自然场景图像中提取量不计。在竞争中，我们首先在合成数据集上进行预训练，然后在训练集中对模型进行数据增强进行微调。同时，对另外两个型号进行了专门针对延长和垂直文本的培训。最后，我们将不同模型的输出与不同的层，不同的骨干和不同种子结合在一起，作为最终结果。我们的解决方案仅在考虑使用量量的单词时就达到了59.45 \％的单词精度。

This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune the model on the training set with data augmentations. Meanwhile, two additional models are trained specifically for long and vertical texts. Finally, we combine the output from different models with different layers, different backbones, and different seeds as the final results. Our solution achieves a word accuracy of 59.45\% when considering out-of-vocabulary words only.

下载PDF全文

下载文献需遵守相关版权规定

论文标题