TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

被引：32

作者：

Singh, Amanpreet ^{[1
]}

Peng, Guan ^{[1
]}

Toh, Mandy ^{[1
]}

Huang, Jing ^{[1
]}

Galuba, Wojciech ^{[1
]}

Hassner, Tal ^{[1
]}

机构：

[1] Facebook AI Res, Menlo Pk, CA 94025 USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.00869

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system. The current systems are crippled by the unavailability of ground truth text annotations for these datasets as well as lack of scene text detection and recognition datasets on real images disallowing the progress in the field of OCR and evaluation of scene text based reasoning in isolation from OCR systems. In this work, we propose TextOCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images from TextVQA dataset. We show that current state-of-the-art text-recognition (OCR) models fail to perform well on TextOCR and that training on TextOCR helps achieve state-of-the-art performance on multiple other OCR datasets as well. We use a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion, allowing us to revisit several design choices to achieve new state-of-the-art performance on TextVQA dataset.

引用

页码：8798 / 8808

页数：11

共 50 条

[1] Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting
Qiao, Liang
Tang, Sanli
Cheng, Zhanzhan
Xu, Yunlu
Niu, Yi
Pu, Shiliang
Wu, Fei
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11899 - 11907
[2] Toward Arbitrary-Shaped Text Spotting Based on End-to-End
Wei, Guangcun
Rong, Wansheng
Liang, Yongquan
Xiao, Xinguang
Liu, Xiang
IEEE ACCESS, 2020, 8 (08): : 159906 - 159914
[3] SText-DETR: End-to-End Arbitrary-Shaped Text Detection with Scalable Query in Transformer
Liao, Pujin
Wang, Zengfu
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 481 - 492
[4] TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting
Feng, Wei
He, Wenhao
Yin, Fei
Zhang, Xu-Yao
Liu, Cheng-Lin
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9075 - 9084
[5] FusedNet: End-to-End Mobile Robot Relocalization in Dynamic Large-Scale Scene
Chen, Fang-xing
Tang, Yifan
Tai, Cong
Liu, Xue-ping
Wu, Xiang
Zhang, Tao
Zeng, Long
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (05) : 4099 - 4105
[6] End-to-End Scene Text Recognition
Wang, Kai
Babenko, Boris
Belongie, Serge
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
[7] Towards End-to-End Unified Scene Text Detection and Layout Analysis
Long, Shangbang
Qin, Siyang
Panteleev, Dmitry
Bissacco, Alessandro
Fujii, Yasuhisa
Raptis, Michalis
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1039 - 1049
[8] Fuzzy Semantics for Arbitrary-Shaped Scene Text Detection
Wang, Fangfang
Xu, Xiaogang
Chen, Yifeng
Li, Xi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1 - 12
[9] An End-to-End Scene Text Recognition for Bilingual Text
Albalawi, Bayan M.
Jamal, Amani T.
Al Khuzayem, Lama A.
Alsaedi, Olaa A.
BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (09)
[10] Towards Optimizing Large-Scale Data Transfers with End-to-End Integrity Verification
Liu, Si
Jung, Eun-Sung
Kettimuthu, Rajkumar
Sun, Xian-He
Papka, Michael
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3002 - 3007

← 1 2 3 4 5 →