Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

被引：0

作者：

Chen, Jingye ^{[1
]}

Yu, Haiyang ^{[1
]}

Ma, Jianqi ^{[2
]}

Li, Bin ^{[1
]}

Xue, Xiangyang ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China

[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金;

关键词：

RECOGNITION; NETWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the last decade, the blossom of deep learning has witnessed the rapid development of scene text recognition. However, the recognition of low-resolution scene text images remains a challenge. Even though some super-resolution methods have been proposed to tackle this problem, they usually treat text images as general images while ignoring the fact that the visual quality of strokes (the atomic unit of text) plays an essential role for text recognition. According to Gestalt Psychology, humans are capable of composing parts of details into the most similar objects guided by prior knowledge. Likewise, when humans observe a low-resolution text image, they will inherently use partial stroke-level details to recover the appearance of holistic characters. Inspired by Gestalt Psychology, we put forward a Stroke-Aware Scene Text Image Super-Resolution method containing a Stroke-Focused Module (SFM) to concentrate on stroke-level internal structures of characters in text images. Specifically, we attempt to design rules for decomposing English characters and digits at stroke-level, then pre-train a text recognizer to provide stroke-level attention maps as positional clues with the purpose of controlling the consistency between the generated super-resolution image and high-resolution ground truth. The extensive experimental results validate that the proposed method can indeed generate more distinguishable images on Text-Zoom and manually constructed Chinese character dataset Degraded-IC13. Furthermore, since the proposed SFM is only used to provide stroke-level guidance when training, it will not bring any time overhead during the test phase.

引用

页码：285 / 293

页数：9

共 50 条

[31] Parametric loss-based super-resolution for scene text recognition
Viriyavisuthisakul, Supatta
Sanguansat, Parinya
Racharak, Teeradaj
Le Nguyen, Minh
Kaothanthong, Natsuda
Haruechaiyasak, Choochart
Yamasaki, Toshihiko
MACHINE VISION AND APPLICATIONS, 2023, 34 (04)
[32] Image and Text: Fighting the Same Battle? Super-resolution Learning for Imbalanced Text Classification
Meunier, Romain
Benamar, Farah
Moriceau, Veronique
Stolfl, Patricia
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10707 - 10720
[33] Scene Text Aware Image Retargeting
Patel, Diptiben
Raman, Shanmuganathan
2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
[34] ICDAR2015 Competition on Text Image Super-Resolution
Peyrard, Clement
Baccouche, Moez
Mamalet, Franck
Garcia, Christophe
2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1201 - 1205
[35] ADVERSARIAL TEXT IMAGE SUPER-RESOLUTION USING SINKHORN DISTANCE
Geng, Cong
Chen, Li
Zhang, Xiaoyun
Gao, Zhiyong
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2663 - 2667
[36] Anisotropic Total Variation Method for Text Image Super-Resolution
Bayarsaikhan, Battulga
Kwon, Younghee
Kim, Jin Hyung
PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 473 - 479
[37] Scene Text Image Super-Resolution Through Multi-Scale Interaction of Structural and Semantic Priors
Zhu Z.
Zhang L.
Bai Y.
Wang Y.
Li P.
IEEE Transactions on Artificial Intelligence, 2024, 5 (07): : 1 - 11
[38] Scene Text Image Super-Resolution Reconstruction Based on Perceiving Multi-Domain Character Distance
Huang, Jun-Yang
Chen, Hong-Hui
Wang, Jia-Bao
Chen, Ping-Ping
Lin, Zhi-Jian
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (07): : 2262 - 2270
[39] Soft-edge-guided significant coordinate attention network for scene text image super-resolution
Xi, Chenchen
Zhang, Kaibing
He, Xin
Hu, Yanting
Chen, Jinguang
VISUAL COMPUTER, 2024, 40 (08): : 5393 - 5406
[40] TextDiff: Enhancing scene text image super-resolution with mask-guided residual diffusion models
Liu, Baolin
Yang, Zongyuan
Chiu, Chinwai
Xiong, Yongping
PATTERN RECOGNITION, 2025, 164

← 1 2 3 4 5 →