Cross-Modal Attentional Context Learning for RGB-D Object Detection

被引:55
|
作者
Li, Guanbin [1 ]
Gan, Yukang [1 ]
Wu, Hejun [1 ]
Xiao, Nong [1 ]
Lin, Liang [1 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D object detection; attentional context modeling; cross modal feature; convolutional neural network; RECOGNITION;
D O I
10.1109/TIP.2018.2878956
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing objects from simultaneously sensed photometric (RGB) and depth channels is a fundamental yet practical problem in many machine vision applications, such as robot grasping and autonomous driving. In this paper, we address this problem by developing a cross-modal attentional context (CMAC) learning framework, which enables the full exploitation of the context information from both RGB and depth data. Compared to existing RGB-D object detection frameworks, our approach has several appealing properties. First, it consists of an attention-based global context model for exploiting adaptive contextual information and incorporating this information into a region-based CNN (e.g., fast RCNN) framework to achieve improved object detection performance. Second, our CMAC framework further contains a fine-grained object part attention module to harness multiple discriminative object parts inside each possible object region for superior local feature representation. While greatly improving the accuracy of RGB-D object detection, the effective cross-modal information fusion as well as attentional context modeling in our proposed model provide an interpretable visualization scheme. Experimental results demonstrate that the proposed method significantly improves upon the state of the art on all public benchmarks.
引用
收藏
页码:1591 / 1601
页数:11
相关论文
共 50 条
  • [1] Lightweight cross-modal transformer for RGB-D salient object detection
    Huang, Nianchang
    Yang, Yang
    Zhang, Qiang
    Han, Jungong
    Huang, Jin
    [J]. Computer Vision and Image Understanding, 2024, 249
  • [2] RGB-D salient object detection with asymmetric cross-modal fusion
    Yu M.
    Xing Z.-H.
    Liu Y.
    [J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2487 - 2495
  • [3] Cross-Modal Adaptation for RGB-D Detection
    Hoffman, Judy
    Gupta, Saurabh
    Leong, Jian
    Guadarrama, Sergio
    Darrell, Trevor
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 5032 - 5039
  • [4] Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond
    Chen, Hao
    Shen, Feihong
    Ding, Ding
    Deng, Yongjian
    Li, Chao
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1699 - 1709
  • [5] Cross-modal hierarchical interaction network for RGB-D salient object detection
    Bi, Hongbo
    Wu, Ranwan
    Liu, Ziqi
    Zhu, Huihui
    Zhang, Cong
    Xiang, Tian -Zhu
    [J]. PATTERN RECOGNITION, 2023, 136
  • [6] Joint Cross-Modal and Unimodal Features for RGB-D Salient Object Detection
    Huang, Nianchang
    Liu, Yi
    Zhang, Qiang
    Han, Jungong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2428 - 2441
  • [7] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhao, Zhengyun
    Huang, Ziqing
    Chai, Xiuli
    Wang, Jun
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (01) : 361 - 384
  • [8] Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection
    Hu, Xihang
    Sun, Fuming
    Sun, Jing
    Wang, Fasheng
    Li, Haojie
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3067 - 3085
  • [9] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhengyun Zhao
    Ziqing Huang
    Xiuli Chai
    Jun Wang
    [J]. Neural Processing Letters, 2023, 55 : 361 - 384
  • [10] A cross-modal edge-guided salient object detection for RGB-D image
    Liu, Zhengyi
    Wang, Kaixun
    Dong, Hao
    Wang, Yuan
    [J]. NEUROCOMPUTING, 2021, 454 : 168 - 177