TIRec: Transformer-based Invoice Text Recognition

被引：0

作者：

Chen, Yanlan ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023 | 2023年

关键词：

Text recognition; Invoice; Convolutional Vision Transformer;

D O I：

10.1145/3590003.3590034

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.

引用

页码：175 / 180

页数：6

共 50 条

[41] Transformer-based structuring of free-text radiology report databases
S. Nowak
D. Biesner
Y. C. Layer
M. Theis
H. Schneider
W. Block
B. Wulff
U. I. Attenberger
R. Sifa
A. M. Sprinkart
European Radiology, 2023, 33 : 4228 - 4236
[42] A Transformer-based Radical Analysis Network for Chinese Character Recognition
Yang, Chen
Wang, Qing
Du, Jun
Zhang, Jianshu
Wu, Changjie
Wang, Jiaming
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3714 - 3719
[43] Transformer-based structuring of free-text radiology report databases
Nowak, S.
Biesner, D.
Layer, Y. C.
Theis, M.
Schneider, H.
Block, W.
Wulff, B.
Attenberger, U. I.
Sifa, R.
Sprinkart, A. M.
EUROPEAN RADIOLOGY, 2023, 33 (06) : 4228 - 4236
[44] Multi-Level Transformer-Based Social Relation Recognition
Wang, Yuchen
Qing, Linbo
Wang, Zhengyong
Cheng, Yongqiang
Peng, Yonghong
SENSORS, 2022, 22 (15)
[45] Transformer-based models for intrapulse modulation recognition of radar waveforms
Bhatti, Sidra Ghayour
Taj, Imtiaz Ahmad
Ullah, Mohsin
Bhatti, Aamer Iqbal
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
[46] Transformer-Based Bidirectional Encoder Representations for Emotion Detection from Text
Kumar, Ashok J.
Cambria, Erik
Trueman, Tina Esther
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[47] Transformer-based Convolution-free Visual Place Recognition
Urban, Anna
Kwolek, Bogdan
2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 161 - 166
[48] Transformer-Based Self-Supervised Learning for Emotion Recognition
Vazquez-Rodriguez, Juan
Lefebvre, Gregoire
Cumin, Julien
Crowley, James L.
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
[49] Arabic abstractive text summarization using RNN-based and transformer-based architectures
Bani-Almarjeh, Mohammad
Kurdy, Mohamad-Bassam
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
[50] End to end transformer-based contextual speech recognition based on pointer network
Lin, Binghuai
Wang, Liyuan
INTERSPEECH 2021, 2021, : 2087 - 2091

← 1 2 3 4 5 →