TIRec: Transformer-based Invoice Text Recognition

被引:0
|
作者
Chen, Yanlan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Text recognition; Invoice; Convolutional Vision Transformer;
D O I
10.1145/3590003.3590034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.
引用
收藏
页码:175 / 180
页数:6
相关论文
共 50 条
  • [1] A Transformer-Based Framework for Scene Text Recognition
    Selvam, Prabu
    Koilraj, Joseph Abraham Sundar
    Tavera Romero, Carlos Andres
    Alharbi, Meshal
    Mehbodniya, Abolfazl
    Webber, Julian L.
    Sengan, Sudhakar
    IEEE ACCESS, 2022, 10 : 100895 - 100910
  • [2] A Light Transformer-Based Architecture for Handwritten Text Recognition
    Barrere, Killian
    Soullard, Yann
    Lemaitre, Aurelie
    Couasnon, Bertrand
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 275 - 290
  • [3] A transformer-based approach for Arabic offline handwritten text recognition
    Momeni, Saleh
    Babaali, Bagher
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3053 - 3062
  • [4] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [5] A transformer-based approach for Arabic offline handwritten text recognition
    Saleh Momeni
    Bagher BabaAli
    Signal, Image and Video Processing, 2024, 18 : 3053 - 3062
  • [6] Transformer-based Text Detection in the Wild
    Raisi, Zobeir
    Naiel, Mohamed A.
    Younes, Georges
    Wardell, Steven
    Zelek, John S.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3156 - 3165
  • [7] PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition
    Wang, Yuxin
    Xie, Hongtao
    Fang, Shancheng
    Xing, Mengting
    Wang, Jing
    Zhu, Shenggao
    Zhang, Yongdong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5585 - 5598
  • [8] A transformer-based network for speech recognition
    Tang L.
    International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
  • [9] Practical Transformer-based Multilingual Text Classification
    Wang, Cindy
    Banko, Michele
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 121 - 129
  • [10] Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
    de Lima Santos, Diego Bernardes
    de Carvalho Dutra, Frederico Giffoni
    Parreiras, Fernando Silva
    Brandao, Wladmir Cardoso
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 473 - 483