Object detection enhanced context model

被引:1
|
作者
Zheng C.-B. [1 ]
Zhang Y. [1 ]
Hu H. [2 ]
Wu Y.-R. [1 ]
Huang G.-J. [3 ]
机构
[1] School of Instrumetation and Optoelectronic Engineering, Beihang University, Beijing
[2] Unit 66133 of PLA, Beijing
[3] School of Aeronautic Science and Engineering, Beihang University, Beijing
关键词
Context information; Effective receptive field; Enhanced context module (ECM); Object detection; One-stage object detector;
D O I
10.3785/j.issn.1008-973X.2020.03.013
中图分类号
学科分类号
摘要
Double-atrous convolution structure was used in enhanced context module (ECM) of the enhanced context model to reduce parameters while expanding effective receptive field to enhance context information of shallow layers, and ECM flexibly acted on middle shallow prediction layers with less damage to original SSD, forming enhanced context model net (ECMNet). Using input image with size of 300×300, ECMNet obtained mean average precision of 80.52% on PASCAL VOC2007 test set, and achieved 73.5 frames per second on 1080Ti. The experimental results show that ECMNet can effectively enhance context information and achieves a better trade-off in parameter, speed and accuracy, which is superior to many state-of-the-art object detectors. © 2020, Zhejiang University Press. All right reserved.
引用
收藏
页码:529 / 539
页数:10
相关论文
共 24 条
  • [1] Liu S.T., Huang D., Wang Y.H., Receptive field block net for accurate and fast object detection, European Conference on Computer Vision, pp. 404-418, (2018)
  • [2] Liu W., Anguelov D., Erhan D., Et al., SSD: single shot multibox detector, European Conference on Computer Vision, pp. 21-37, (2016)
  • [3] Luo W.J., Li Y.J., Urtasun R., Et al., Understanding the effective receptive field in deep convolutional neural networks, Neural Information Processing Systems, pp. 4898-4906, (2016)
  • [4] Lin T.Y., Dollar P., Girshick R., Et al., Feature pyramid networks for object detection, Computer Vision and Pattern Recognition, pp. 936-944, (2017)
  • [5] Jeong J., Park H., Kwak N., Enhancement of SSD by concatenating feature maps for object detection
  • [6] Li Z.X., Zhou F.Q., FSSD: feature fusion single shot multibox detector
  • [7] Shelhamer E., Long J., Darrell T., Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 4, pp. 640-651, (2015)
  • [8] Badrinarayanan V., Kendall A., Cipolla R., Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 12, pp. 2481-2495, (2017)
  • [9] Zhao H.S., Shi J.P., Qi X.J., Et al., Pyramid scene parsing network, Computer Vision and Pattern Recognition, pp. 6230-6239, (2017)
  • [10] Chen L.C., Papandreou G., Kokkinos I., Et al., DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 4, pp. 834-848, (2017)