Structure-Aware Cross-Modal Transformer for Depth Completion

被引:2
|
作者
Zhao, Linqing [1 ]
Wei, Yi [2 ,3 ]
Li, Jiaxin [4 ]
Zhou, Jie [2 ,3 ]
Lu, Jiwen [2 ,3 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[3] Beijing Natl Res Ctr Informat Sci & Technol BNRist, Beijing 100084, Peoples R China
[4] Gaussian Robot, Shanghai 201203, Peoples R China
基金
中国国家自然科学基金;
关键词
Depth completion; cross-modal interaction; structure learning; transformer; NETWORK; FUSION;
D O I
10.1109/TIP.2024.3355807
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a Structure-aware Cross-Modal Transformer (SCMT) to fully capture the 3D structures hidden in sparse depths for depth completion. Most existing methods learn to predict dense depths by taking depths as an additional channel of RGB images or learning 2D affinities to perform depth propagation. However, they fail to exploit 3D structures implied in the depth channel, thereby losing the informative 3D knowledge that provides important priors to distinguish the foreground and background features. Moreover, since these methods rely on the color textures of 2D images, it is challenging for them to handle poor-texture regions without the guidance of explicit 3D cues. To address this, we disentangle the hierarchical 3D scene-level structure from the RGB-D input and construct a pathway to make sharp depth boundaries and object shape outlines accessible to 2D features. Specifically, we extract 2D and 3D features from depth inputs and the back-projected point clouds respectively by building a two-stream network. To leverage 3D structures, we construct several cross-modal transformers to adaptively propagate multi-scale 3D structural features to the 2D stream, energizing 2D features with priors of object shapes and local geometries. Experimental results show that our SCMT achieves state-of-the-art performance on three popular outdoor (KITTI) and indoor (VOID and NYU) benchmarks.
引用
收藏
页码:1016 / 1031
页数:16
相关论文
共 50 条
  • [41] CDPNet: Cross-Modal Dual Phases Network for Point Cloud Completion
    Du, Zhenjiang
    Dou, Jiale
    Liu, Zhitao
    Wei, Jiwei
    Wang, Guan
    Xie, Ning
    Yang, Yang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1635 - 1643
  • [42] GraphGST: Graph Generative Structure-Aware Transformer for Hyperspectral Image Classification
    Jiang, Mengying
    Su, Yuanchao
    Gao, Lianru
    Plaza, Antonio
    Zhao, Xi-Le
    Sun, Xu
    Liu, Guizhong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
  • [43] Simultaneous Depth and Spectral Imaging With a Cross-Modal Stereo System
    Wang, Lizhi
    Xiong, Zhiwei
    Shi, Guangming
    Zeng, Wenjun
    Wu, Feng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (03) : 812 - 817
  • [44] Structure-aware halftoning
    Pang, Wai-Man
    Qu, Yingge
    Wong, Tien-Tsin
    Cohen-Or, Daniel
    Heng, Pheng-Ann
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2008, 27 (03):
  • [45] Enabling inductive knowledge graph completion via structure-aware attention network
    Wang, Jingchao
    Li, Weimin
    Liu, Wei
    Wang, Can
    Jin, Qun
    [J]. APPLIED INTELLIGENCE, 2023, 53 (21) : 25003 - 25027
  • [46] End-to-End Structure-Aware Convolutional Networks for Knowledge Base Completion
    Shang, Chao
    Tang, Yun
    Huang, Jing
    Bi, Jinbo
    He, Xiaodong
    Zhou, Bowen
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3060 - 3067
  • [47] Enabling inductive knowledge graph completion via structure-aware attention network
    Jingchao Wang
    Weimin Li
    Wei Liu
    Can Wang
    Qun Jin
    [J]. Applied Intelligence, 2023, 53 : 25003 - 25027
  • [48] Histopathology Cross-Modal Retrieval based on Dual-Transformer Network
    Hu, Dingyi
    Xie, Fengying
    Jiang, Zhiguo
    Zheng, Yushan
    Shi, Jun
    [J]. 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2022), 2022, : 97 - 102
  • [49] Semantic-alignment transformer and adversary hashing for cross-modal retrieval
    Sun, Yajun
    Wang, Meng
    Ma, Ying
    [J]. APPLIED INTELLIGENCE, 2024, 54 (17-18) : 7581 - 7602
  • [50] HCMT: A Novel Hierarchical Cross-Modal Transformer for Recognition of Abnormal Behavior
    Chuan Liu, Hai
    Khairuddin, Anis Salwa Mohd
    Huang Chuah, Joon
    Min Zhao, Xian
    Dan Wang, Xiao
    Ming Fang, Li
    [J]. IEEE Access, 2024, 12 : 161296 - 161311