A scene text detection based on dual-path feature fusion

被引:0
|
作者
Zhao P. [1 ,2 ]
Xu B.-P. [1 ,2 ]
Yan S. [1 ,2 ]
Liu Z.-Y. [1 ,2 ]
机构
[1] Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei
[2] School of Computer Science and Technology, Anhui University, Hefei
来源
Kongzhi yu Juece/Control and Decision | 2021年 / 36卷 / 09期
关键词
Attention mechanism; Deep learning; Feature fusion; Feature pyramid network; Lightweight neural; Scene text detection;
D O I
10.13195/j.kzyjc.2020.0002
中图分类号
学科分类号
摘要
The existing scene text detection methods based on deep learning generally use a deep neural network as the backbone network for feature extraction. Although it can achieve a striking detection effect, the entire detection model is very large which results in poor detection efficiency. If the large backbone network is replaced by a small backbone network directly, it will often fail to extract enough semantic features and can't achieve an ideal detection result. To reduce the size of the scene text detection model and promote the detection efficiency, a dual-path feature fusion based scene text detection (DPFFSTD) is proposed. Based on a relatively lightweight basic network EfficientNet-b3, the DPFF uses two branches for feature fusion to detect scene text. One branch uses a feature pyramid network to fuse the features with different levels. The other branch uses an atrous spatial pyramid pooling to enlarge receptive field and obtains the features of different scales. And then the features from the above two branches are fused to form more features only with a very small increasing computation, which makes up for the shortage of features caused by the small backbone network. The experimental results on three benchmark datasets show that the proposed method significantly reduces the number of the model parameters and greatly improves the detection efficiency while maintaining a high detection effect. © 2021, Editorial Office of Control and Decision. All right reserved.
引用
收藏
页码:2179 / 2186
页数:7
相关论文
共 22 条
  • [1] Ren S, He K, Girshick R, Et al., Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 6, pp. 1137-1149, (2015)
  • [2] Liu W, Anguelov D, Erhan D, Et al., Ssd: Single shot multibox detector, Proceedings of European Conference on Computer, pp. 21-37, (2016)
  • [3] Tian Z, Huang W L, He T, Et al., Detecting text in natural image with connectionist text proposal network, Proceedings of Europan Conference on Computer vision, pp. 56-72, (2016)
  • [4] Ma J, Shao W Y, Ye H, Et al., Arbitrary-oriented scene text detection via rotation proposals, IEEE Transactions on Multimedia, 20, 11, pp. 3111-3122, (2018)
  • [5] Shi B G, Bai X, Belongie S., Detecting oriented text in natural images by linking segments, Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550-2558, (2017)
  • [6] He P, Huang W L, He T, Et al., Single shot text detector with regional attention, Proceedings of the 2017 IEEE International Conference on Computer Vision, pp. 3047-3055, (2017)
  • [7] Hu H, Zhang C Q, Luo Y X, Et al., Wordsup: Exploiting word annotations for character based text detection, Proceedings of the 2017 IEEE International Conference on Computer Vision, pp. 4940-4949, (2017)
  • [8] Liu Y L, Jin L W, Zhang S T, Et al., Detecting curve text in the wild: New dataset and new solution, (2017)
  • [9] Long S B, Ruan J Q, Zhang W J, Et al., TextSnake: A flexible representation for detecting text of arbitrary shapes, Proceedings of the 2018 European Conference on Computer Vision, pp. 19-35, (2018)
  • [10] Xu Y C, Wang Y K, Zhou W, Et al., Textfield: Learning a deep direction field for irregular scene text detection, IEEE Transactions on Image Processing, 28, 11, pp. 5566-5579, (2019)