Learning Cross-modality Interaction for Robust Depth Perception of Autonomous Driving

被引:1
|
作者
Liang, Yunji [1 ]
Chen, Nengzhen [1 ]
Yu, Zhiwen [1 ]
Tang, Lei [2 ]
Yu, Hongkai [3 ]
Guo, Bin [1 ]
Zeng, Daniel Dajun [4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, 1 Dongxiang Rd, Xian 710129, Shaanxi, Peoples R China
[2] Changan Univ, Sch Informat Engn, 126 Naner Huan Rd, Xian 710064, Shaanxi, Peoples R China
[3] Cleveland State Univ, 2121 Euclid Ave, Cleveland, OH 4411 USA
[4] Chinese Acad Sci, Inst Automat, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
关键词
Cascading interaction; autonomous systems; auxiliary task; depth prediction; depth completion; NETWORK; IMAGE;
D O I
10.1145/3650039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As one of the fundamental tasks of autonomous driving, depth perception aims to perceive physical objects in three dimensions and to judge their distances away from the ego vehicle. Although great efforts have been made for depth perception, LiDAR-based and camera-based solutions have limitations with low accuracy and poor robustness for noise input. With the integration of monocular cameras and LiDAR sensors in autonomous vehicles, in this article, we introduce a two-stream architecture to learn the modality interaction representation under the guidance of an image reconstruction task to compensate for the deficiencies of each modality in a parallel manner. Specifically, in the two-stream architecture, the multi-scale cross-modality interactions are preserved via a cascading interaction network under the guidance of the reconstruction task. Next, the shared representation of modality interaction is integrated to infer the dense depth map due to the complementarity and heterogeneity of the two modalities. We evaluated the proposed solution on the KITTI dataset and CALAR synthetic dataset. Our experimental results show that learning the coupled interaction of modalities under the guidance of an auxiliary task can lead to significant performance improvements. Furthermore, our approach is competitive against the state-of-the-art models and robust against the noisy input. The source code is available at https://github.com/tonyFengye/Code/tree/master.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Representation Learning for Cross-Modality Classification
    van Tulder, Gijs
    de Bruijne, Marleen
    MEDICAL COMPUTER VISION AND BAYESIAN AND GRAPHICAL MODELS FOR BIOMEDICAL IMAGING, 2017, 10081 : 126 - 136
  • [2] Robust video question answering via contrastive cross-modality representation learning
    Xun YANG
    Jianming ZENG
    Dan GUO
    Shanshan WANG
    Jianfeng DONG
    Meng WANG
    Science China(Information Sciences), 2024, 67 (10) : 211 - 226
  • [3] Robust video question answering via contrastive cross-modality representation learning
    Yang, Xun
    Zeng, Jianming
    Guo, Dan
    Wang, Shanshan
    Dong, Jianfeng
    Wang, Meng
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (10)
  • [4] CROSS-MODALITY SET EFFECT ON PERCEPTION OF AMBIGUOUS PICTURES
    LIU, AY
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1976, 7 (03) : 331 - 333
  • [5] Cross-Modality Learning by Exploring Modality Interactions for Emotion Reasoning
    Tran, Thi-Dung
    Ho, Ngoc-Huynh
    Pant, Sudarshan
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Gueesang
    IEEE ACCESS, 2023, 11 : 56634 - 56648
  • [6] Cross-modality collaborative learning identified pedestrian
    Wen, Xiongjun
    Feng, Xin
    Li, Ping
    Chen, Wenfang
    VISUAL COMPUTER, 2023, 39 (09): : 4117 - 4132
  • [7] Learning Cross-modality Similarity for Multinomial Data
    Jia, Yangqing
    Salzmann, Mathieu
    Darrell, Trevor
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 2407 - 2414
  • [8] Cross-modality collaborative learning identified pedestrian
    Xiongjun Wen
    Xin Feng
    Ping Li
    Wenfang Chen
    The Visual Computer, 2023, 39 : 4117 - 4132
  • [9] Representation Learning Through Cross-Modality Supervision
    Sankaran, Nishant
    Mohan, Deen Dayal
    Setlur, Srirangaraj
    Govindaraju, Venugopal
    Fedorishin, Dennis
    2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 107 - 114
  • [10] Cross-Modality Retrieval by Joint Correlation Learning
    Wang, Shuo
    Guo, Dan
    Xu, Xin
    Zhuo, Li
    Wang, Meng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (02)