Depth-Guided Progressive Network for Object Detection

被引:2
|
作者
Ma, Jia-Wei [1 ,2 ]
Liang, Min [2 ]
Chen, Song-Lu [1 ,2 ]
Chen, Feng [3 ]
Tian, Shu [2 ]
Qin, Jingyan [4 ]
Yin, Xu-Cheng [1 ,2 ]
机构
[1] Univ Sci & Technol Beijing, USTB EEasyTech Joint Lab Artificial Intelligence, Beijing 100083, Peoples R China
[2] Univ Sci & Technol Beijing, Dept Comp Sci & Technol, Beijing 100083, Peoples R China
[3] EEasy Technol Co Ltd, Zhuhai 519000, Peoples R China
[4] Univ Sci & Technol Beijing, Dept Ind Design, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Object detection; Detectors; Interference; Signal to noise ratio; Semantics; Location awareness; multi-scale object; depth-guided; progressive sampling;
D O I
10.1109/TITS.2022.3156365
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Multi-scale object detection in natural scenes is still challenging. To enhance the multi-scale perception capability, some algorithms combine the lower-level and higher-level information via multi-scale feature fusion strategies. However, the inherent spatial properties among instances and relations between foreground and background are ignored. In addition, the human-defined ``center-based'' regression quality evaluation strategy, predicting a high-to-low score based on a linear relationship with the distance to the center of ground-truth box, is not robust to scale-variant objects. In this work, we propose a Depth-Guided Progressive Network (DGPNet) for multi-scale object detection. Specifically, besides the prediction of classification and localization, the depth is estimated and used to guide the image features in a weighted manner to obtain a better spatial representation. Therefore, depth estimation and 2D object detection are simultaneously learned via a unified network, where the depth features are merged as auxiliary information into the detection branch to enhance the discrimination among multi-scale objects. Moreover, to overcome the difficulty of empirically fitting the localization quality function, high-quality predicted boxes on scale-variant objects are more adaptively obtained by an IoU-aware progressive sampling strategy. We divide the sampling process into two stages, i.e., ``statistical-aware'' and ``IoU-aware''. The former selects thresholds for positive samples based on statistical characteristics of multi-scale instances, and the latter further selects high-quality samples by IoU on the basis of the former. Therefore, the final ranking scores better reflect the quality of localization. Experiments verify that our method outperforms state-of-the-art methods on the KINS and Cityscapes dataset.
引用
收藏
页码:19523 / 19533
页数:11
相关论文
共 50 条
  • [1] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
    Zhang, Renrui
    Qiu, Han
    Wang, Tai
    Guo, Ziyu
    Cui, Ziteng
    Qiao, Yu
    Li, Hongsheng
    Gao, Peng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
  • [2] DGFNet: Depth-Guided Cross-Modality Fusion Network for RGB-D Salient Object Detection
    Xiao, Fen
    Pu, Zhengdong
    Chen, Jiaqi
    Gao, Xieping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2648 - 2658
  • [3] Learning Depth-Guided Convolutions for Monocular 3D Object Detection
    Ng, Mingyu
    Huo, Yuqi
    Yi, Hongwei
    Wang, Zhe
    Shi, Jianping
    Lu, Zhiwu
    Luo, Ping
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4306 - 4315
  • [4] Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth
    Chen, Yi-Rong
    Tseng, Ching-Yu
    Liou, Yi-Syuan
    Wu, Tsung-Han
    Hsu, Winston H.
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [5] CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection
    Tseng, Ching-Yu
    Chen, Yi-Rong
    Lee, Hsin-Ying
    Wu, Tsung-Han
    Chen, Wen-Chin
    Hsu, Winston H.
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 4850 - 4857
  • [6] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Cong Pan
    Junran Peng
    Zhaoxiang Zhang
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (03) : 673 - 689
  • [7] Depth-guided Robust Face Morphing Attack Detection
    Rachalwar, Harsh
    Fang, Meiling
    Damer, Naser
    Das, Abhijit
    2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
  • [8] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Pan, Cong
    Peng, Junran
    Zhang, Zhaoxiang
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (03) : 673 - 689
  • [9] OBJECT-AWARE CALIBRATED DEPTH-GUIDED TRANSFORMER FOR RGB-D CO-SALIENT OBJECT DETECTION
    Wu, Yang
    Liang, Lingyan
    Zhao, Yaqian
    Zhang, Kaihua
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1121 - 1126
  • [10] Depth-guided saliency detection via boundary information
    Zhou, Xiaofei
    Wen, Hongfa
    Shi, Ran
    Yin, Haibing
    Yan, Chenggang
    IMAGE AND VISION COMPUTING, 2020, 103 (103)