Structure-Aware Cross-Modal Transformer for Depth Completion

被引:2
|
作者
Zhao, Linqing [1 ]
Wei, Yi [2 ,3 ]
Li, Jiaxin [4 ]
Zhou, Jie [2 ,3 ]
Lu, Jiwen [2 ,3 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[3] Beijing Natl Res Ctr Informat Sci & Technol BNRist, Beijing 100084, Peoples R China
[4] Gaussian Robot, Shanghai 201203, Peoples R China
基金
中国国家自然科学基金;
关键词
Depth completion; cross-modal interaction; structure learning; transformer; NETWORK; FUSION;
D O I
10.1109/TIP.2024.3355807
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a Structure-aware Cross-Modal Transformer (SCMT) to fully capture the 3D structures hidden in sparse depths for depth completion. Most existing methods learn to predict dense depths by taking depths as an additional channel of RGB images or learning 2D affinities to perform depth propagation. However, they fail to exploit 3D structures implied in the depth channel, thereby losing the informative 3D knowledge that provides important priors to distinguish the foreground and background features. Moreover, since these methods rely on the color textures of 2D images, it is challenging for them to handle poor-texture regions without the guidance of explicit 3D cues. To address this, we disentangle the hierarchical 3D scene-level structure from the RGB-D input and construct a pathway to make sharp depth boundaries and object shape outlines accessible to 2D features. Specifically, we extract 2D and 3D features from depth inputs and the back-projected point clouds respectively by building a two-stream network. To leverage 3D structures, we construct several cross-modal transformers to adaptively propagate multi-scale 3D structural features to the 2D stream, energizing 2D features with priors of object shapes and local geometries. Experimental results show that our SCMT achieves state-of-the-art performance on three popular outdoor (KITTI) and indoor (VOID and NYU) benchmarks.
引用
收藏
页码:1016 / 1031
页数:16
相关论文
共 50 条
  • [31] Cascaded cross-modal transformer for audio-textual classification
    Ristea, Nicolae-Catalin
    Anghel, Andrei
    Ionescu, Radu Tudor
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
  • [32] Decoupled Cross-Modal Transformer for Referring Video Object Segmentation
    Wu, Ao
    Wang, Rong
    Tan, Quange
    Song, Zhenfeng
    [J]. SENSORS, 2024, 24 (16)
  • [33] IPE Transformer for Depth Completion with Input-Aware Positional Embeddings
    Li, Bocen
    Li, Guozhen
    Wang, Haiting
    Wang, Lijun
    Gong, Zhenfei
    Zhang, Xiaohua
    Lu, Huchuan
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 263 - 275
  • [34] SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization
    Zheng, Chuanyang
    Li, Zheyang
    Zhang, Kai
    Yang, Zhi
    Tan, Wenming
    Xiao, Jun
    Ren, Ye
    Pu, Shiliang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [35] StacMR: Scene-Text Aware Cross-Modal Retrieval
    Mafla, Andres
    Rezende, Rafael S.
    Gomez, Lluis
    Larlus, Diane
    Karatzas, Dimosthenis
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2219 - 2229
  • [36] Advancing rule learning in knowledge graphs with structure-aware graph transformer
    Xu, Kang
    Chen, Miqi
    Feng, Yifan
    Dong, Zhenjiang
    [J]. Information Processing and Management, 2025, 62 (02):
  • [37] TEACH: Attention-Aware Deep Cross-Modal Hashing
    Yao, Hong-Lei
    Zhan, Yu-Wei
    Chen, Zhen-Duo
    Luo, Xin
    Xu, Xin-Shun
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 376 - 384
  • [38] Data-Aware Proxy Hashing for Cross-modal Retrieval
    Tu, Rong-Cheng
    Mao, Xian-Ling
    Ji, Wenjin
    Wei, Wei
    Huang, Heyan
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 686 - 696
  • [39] Negative Pre-aware for Noisy Cross-Modal Matching
    Zhang, Xu
    Li, Hao
    Ye, Mang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7341 - 7349
  • [40] CDPNet: Cross-Modal Dual Phases Network for Point Cloud Completion
    Du, Zhenjiang
    Dou, Jiale
    Liu, Zhitao
    Wei, Jiwei
    Wang, Guan
    Xie, Ning
    Yang, Yang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1635 - 1643