TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

被引:32
|
作者
Singh, Amanpreet [1 ]
Peng, Guan [1 ]
Toh, Mandy [1 ]
Huang, Jing [1 ]
Galuba, Wojciech [1 ]
Hassner, Tal [1 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
关键词
D O I
10.1109/CVPR46437.2021.00869
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system. The current systems are crippled by the unavailability of ground truth text annotations for these datasets as well as lack of scene text detection and recognition datasets on real images disallowing the progress in the field of OCR and evaluation of scene text based reasoning in isolation from OCR systems. In this work, we propose TextOCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images from TextVQA dataset. We show that current state-of-the-art text-recognition (OCR) models fail to perform well on TextOCR and that training on TextOCR helps achieve state-of-the-art performance on multiple other OCR datasets as well. We use a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion, allowing us to revisit several design choices to achieve new state-of-the-art performance on TextVQA dataset.
引用
收藏
页码:8798 / 8808
页数:11
相关论文
共 50 条
  • [1] Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting
    Qiao, Liang
    Tang, Sanli
    Cheng, Zhanzhan
    Xu, Yunlu
    Niu, Yi
    Pu, Shiliang
    Wu, Fei
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11899 - 11907
  • [2] Toward Arbitrary-Shaped Text Spotting Based on End-to-End
    Wei, Guangcun
    Rong, Wansheng
    Liang, Yongquan
    Xiao, Xinguang
    Liu, Xiang
    IEEE ACCESS, 2020, 8 (08): : 159906 - 159914
  • [3] SText-DETR: End-to-End Arbitrary-Shaped Text Detection with Scalable Query in Transformer
    Liao, Pujin
    Wang, Zengfu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 481 - 492
  • [4] TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting
    Feng, Wei
    He, Wenhao
    Yin, Fei
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9075 - 9084
  • [5] FusedNet: End-to-End Mobile Robot Relocalization in Dynamic Large-Scale Scene
    Chen, Fang-xing
    Tang, Yifan
    Tai, Cong
    Liu, Xue-ping
    Wu, Xiang
    Zhang, Tao
    Zeng, Long
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (05) : 4099 - 4105
  • [6] End-to-End Scene Text Recognition
    Wang, Kai
    Babenko, Boris
    Belongie, Serge
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
  • [7] Towards End-to-End Unified Scene Text Detection and Layout Analysis
    Long, Shangbang
    Qin, Siyang
    Panteleev, Dmitry
    Bissacco, Alessandro
    Fujii, Yasuhisa
    Raptis, Michalis
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1039 - 1049
  • [8] Fuzzy Semantics for Arbitrary-Shaped Scene Text Detection
    Wang, Fangfang
    Xu, Xiaogang
    Chen, Yifeng
    Li, Xi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1 - 12
  • [9] An End-to-End Scene Text Recognition for Bilingual Text
    Albalawi, Bayan M.
    Jamal, Amani T.
    Al Khuzayem, Lama A.
    Alsaedi, Olaa A.
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (09)
  • [10] Towards Optimizing Large-Scale Data Transfers with End-to-End Integrity Verification
    Liu, Si
    Jung, Eun-Sung
    Kettimuthu, Rajkumar
    Sun, Xian-He
    Papka, Michael
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3002 - 3007