TIRec: Transformer-based Invoice Text Recognition

被引：0

作者：

Chen, Yanlan ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023 | 2023年

关键词：

Text recognition; Invoice; Convolutional Vision Transformer;

D O I：

10.1145/3590003.3590034

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.

引用

页码：175 / 180

页数：6

共 50 条

[1] A Transformer-Based Framework for Scene Text Recognition
Selvam, Prabu
Koilraj, Joseph Abraham Sundar
Tavera Romero, Carlos Andres
Alharbi, Meshal
Mehbodniya, Abolfazl
Webber, Julian L.
Sengan, Sudhakar
IEEE ACCESS, 2022, 10 : 100895 - 100910
[2] A Light Transformer-Based Architecture for Handwritten Text Recognition
Barrere, Killian
Soullard, Yann
Lemaitre, Aurelie
Couasnon, Bertrand
DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 275 - 290
[3] A transformer-based approach for Arabic offline handwritten text recognition
Momeni, Saleh
Babaali, Bagher
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3053 - 3062
[4] Transformer-based end-to-end scene text recognition
Zhu, Xinghao
Zhang, Zhi
PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
[5] A transformer-based approach for Arabic offline handwritten text recognition
Saleh Momeni
Bagher BabaAli
Signal, Image and Video Processing, 2024, 18 : 3053 - 3062
[6] Transformer-based Text Detection in the Wild
Raisi, Zobeir
Naiel, Mohamed A.
Younes, Georges
Wardell, Steven
Zelek, John S.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3156 - 3165
[7] PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition
Wang, Yuxin
Xie, Hongtao
Fang, Shancheng
Xing, Mengting
Wang, Jing
Zhu, Shenggao
Zhang, Yongdong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5585 - 5598
[8] A transformer-based network for speech recognition
Tang L.
International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
[9] Practical Transformer-based Multilingual Text Classification
Wang, Cindy
Banko, Michele
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 121 - 129
[10] Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
de Lima Santos, Diego Bernardes
de Carvalho Dutra, Frederico Giffoni
Parreiras, Fernando Silva
Brandao, Wladmir Cardoso
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 473 - 483

← 1 2 3 4 5 →