TIRec: Transformer-based Invoice Text Recognition

被引:0
|
作者
Chen, Yanlan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Text recognition; Invoice; Convolutional Vision Transformer;
D O I
10.1145/3590003.3590034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.
引用
收藏
页码:175 / 180
页数:6
相关论文
共 50 条
  • [21] Transformer-based Models for Arabic Online Handwriting Recognition
    Alwajih, Fakhraddin
    Badr, Eman
    Abdou, Sherif
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 898 - 905
  • [22] A Transformer-Based Network for Dynamic Hand Gesture Recognition
    D'Eusanio, Andrea
    Simoni, Alessandro
    Pini, Stefano
    Borghi, Guido
    Vezzani, Roberto
    Cucchiara, Rita
    2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 623 - 632
  • [23] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
    Wang, Yongqiang
    Mohamed, Abdelrahman
    Le, Duc
    Liu, Chunxi
    Xiao, Alex
    Mahadeokar, Jay
    Huang, Hongzhao
    Tjandra, Andros
    Zhang, Xiaohui
    Zhang, Frank
    Fuegen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
  • [24] Vision Transformer-based recognition of diabetic retinopathy grade
    Wu, Jianfang
    Hu, Ruo
    Xiao, Zhenghong
    Chen, Jiaxu
    Liu, Jingwei
    MEDICAL PHYSICS, 2021, 48 (12) : 7850 - 7863
  • [25] Transformer-based approach for symptom recognition and multilingual linking
    Vassileva, Sylvia
    Grazhdanski, Georgi
    Koychev, Ivan
    Boytcheva, Svetla
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2024, 2024
  • [26] TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
    Tao, Yue
    Jia, Zhiwei
    Ma, Runze
    Xu, Shugong
    ELECTRONICS, 2021, 10 (22)
  • [27] TRANSFORMER-BASED TEXT-TO-SPEECH WITH WEIGHTED FORCED ATTENTION
    Okamoto, Takuma
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6729 - 6733
  • [28] Transformer-Based Composite Language Models for Text Evaluation and Classification
    Skoric, Mihailo
    Utvic, Milos
    Stankovic, Ranka
    MATHEMATICS, 2023, 11 (22)
  • [29] Automatic text summarization using transformer-based language models
    Rao, Ritika
    Sharma, Sourabh
    Malik, Nitin
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
  • [30] RobuTrans: A Robust Transformer-Based Text-to-Speech Model
    Li, Naihan
    Liu, Yanqing
    Wu, Yu
    Liu, Shujie
    Zhao, Sheng
    Liu, Ming
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8228 - 8235