Multi-orientation scene text detection with scale-guided regression

被引:6
|
作者
Liang, Min [1 ]
Hou, Jie-Bo [1 ]
Zhu, Xiaobin [1 ]
Yang, Chun [1 ]
Qin, Jingyan [2 ]
Yin, Xu-Cheng [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
[2] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text detection; Classification; Regression;
D O I
10.1016/j.neucom.2021.07.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing multi-orientation scene text detection methods generally contain two crucial components: regression prediction for text bounding boxes and classification prediction for text/non-text. However, these methods always regard classification prediction and regression prediction as two independent procedures, neglecting fully exploring their mutual relations. Based on this key observation, we propose an innovative Scale-Guided Regression Module (SRM), specially for multi-orientation scene text detection. Equipped with width-guided kernels and height-guided kernels of different sizes, our SRM can generate a series of scale feature maps of candidate texts by capturing their shape information in classification prediction. The scale feature maps are used to predict the width and height of candidate texts, which can serve as guides for regressing bounding boxes. In this way, the procedures of classification and regression can be coherently integrated. In addition, we adopt IoU loss to train our network and then integrate IoU loss and l(1)-smooth loss for fine-tuning. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method. Notably, our method achieves significant improvement of performance on long texts, e.g., on MSRA-TD500, our method outperforms Basemodel with a great margin (4.86% in terms of Recall). (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:310 / 318
页数:9
相关论文
共 50 条
  • [21] Multi-Scale Scene Text Detection Based on Convolutional Neural Network
    Lu, Yan-Feng
    Zhang, Ai-Xuan
    Li, Yi
    Yu, Qian-Hui
    Qiao, Hong
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 583 - 587
  • [22] SCENE TEXT DETECTION BASED ON MULTI-SCALE SWT AND EDGE FILTERING
    Feng, Yuanyuan
    Song, Yonghong
    YualinZhang
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 645 - 650
  • [23] Total-Text: toward orientation robustness in scene text detection
    Ch'ng, Chee-Kheng
    Chan, Chee Seng
    Liu, Cheng-Lin
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2020, 23 (01) : 31 - 52
  • [24] Total-Text: toward orientation robustness in scene text detection
    Chee-Kheng Ch’ng
    Chee Seng Chan
    Cheng-Lin Liu
    International Journal on Document Analysis and Recognition (IJDAR), 2020, 23 : 31 - 52
  • [25] Direct regression scene text detection with accuracy scoring
    Cheng, Peirui
    Zhao, Yuzhong
    Cai, Yuanqiang
    Wang, Weiqiang
    NEUROCOMPUTING, 2022, 501 : 705 - 714
  • [26] Location Sensitive Regression Algorithm for Multi-Oriented Scene Text Detection with Focal Loss
    Kuang, Hailan
    Li, Zheng
    Ma, Xiaolin
    Liu, Xinhua
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 462 - 466
  • [27] SCALE-INVARIANT MULTI-ORIENTED TEXT DETECTION IN WILD SCENE IMAGE
    Dasgupta, Kinjal
    Das, Sudip
    Bhattacharya, Ujjwal
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2041 - 2045
  • [28] Scene Text Detection Based on Multi-Scale Pooling and Bidirectional Feature Fusion
    Wei, Zheliang
    Li, Yueyang
    Luo, Haichi
    Computer Engineering and Applications, 2024, 60 (02) : 154 - 161
  • [29] Realtime multi-scale scene text detection with scale-based region proposal network
    He, Wenhao
    Zhang, Xu-Yao
    Yin, Fei
    Luo, Zhenbo
    Ogier, Jean-Marc
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2020, 98
  • [30] Optic Nerve Head Detection via Group Correlations in Multi-orientation Transforms
    Bekkers, Erik
    Duits, Remco
    Romeny, Bart ter Haar
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2014, PT II, 2014, 8815 : 293 - 302