TIRec: Transformer-based Invoice Text Recognition

被引：0

作者：

Chen, Yanlan ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023 | 2023年

关键词：

Text recognition; Invoice; Convolutional Vision Transformer;

D O I：

10.1145/3590003.3590034

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.

引用

页码：175 / 180

页数：6

共 50 条

[21] Transformer-based Models for Arabic Online Handwriting Recognition
Alwajih, Fakhraddin
Badr, Eman
Abdou, Sherif
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 898 - 905
[22] A Transformer-Based Network for Dynamic Hand Gesture Recognition
D'Eusanio, Andrea
Simoni, Alessandro
Pini, Stefano
Borghi, Guido
Vezzani, Roberto
Cucchiara, Rita
2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 623 - 632
[23] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
Wang, Yongqiang
Mohamed, Abdelrahman
Le, Duc
Liu, Chunxi
Xiao, Alex
Mahadeokar, Jay
Huang, Hongzhao
Tjandra, Andros
Zhang, Xiaohui
Zhang, Frank
Fuegen, Christian
Zweig, Geoffrey
Seltzer, Michael L.
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
[24] Vision Transformer-based recognition of diabetic retinopathy grade
Wu, Jianfang
Hu, Ruo
Xiao, Zhenghong
Chen, Jiaxu
Liu, Jingwei
MEDICAL PHYSICS, 2021, 48 (12) : 7850 - 7863
[25] Transformer-based approach for symptom recognition and multilingual linking
Vassileva, Sylvia
Grazhdanski, Georgi
Koychev, Ivan
Boytcheva, Svetla
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2024, 2024
[26] TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Tao, Yue
Jia, Zhiwei
Ma, Runze
Xu, Shugong
ELECTRONICS, 2021, 10 (22)
[27] TRANSFORMER-BASED TEXT-TO-SPEECH WITH WEIGHTED FORCED ATTENTION
Okamoto, Takuma
Toda, Tomoki
Shiga, Yoshinori
Kawai, Hisashi
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6729 - 6733
[28] Transformer-Based Composite Language Models for Text Evaluation and Classification
Skoric, Mihailo
Utvic, Milos
Stankovic, Ranka
MATHEMATICS, 2023, 11 (22)
[29] Automatic text summarization using transformer-based language models
Rao, Ritika
Sharma, Sourabh
Malik, Nitin
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
[30] RobuTrans: A Robust Transformer-Based Text-to-Speech Model
Li, Naihan
Liu, Yanqing
Wu, Yu
Liu, Shujie
Zhao, Sheng
Liu, Ming
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8228 - 8235

← 1 2 3 4 5 →