TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding

被引:35
|
作者
He, Dailan [1 ]
Zhao, Yusheng [1 ]
Luo, Junyu [1 ]
Hui, Tianrui [2 ]
Huang, Shaofei [2 ]
Zhang, Aixi [3 ]
Liu, Si [4 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[3] Alibaba Grp, Beijing, Peoples R China
[4] Inst Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
3D visual grounding; transformer; entity attention; relation attention;
D O I
10.1145/3474085.3475397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently proposed fine-grained 3D visual grounding is an essential and challenging task, whose goal is to identify the 3D object referred by a natural language sentence from other distractive objects of the same category. Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents. In this work, we exploit Transformer for its natural suitability on permutation-invariant 3D point clouds data and propose a TransRefer3D network to extract entity-andrelation aware multimodal context among objects for more discriminative feature learning. Concretely, we devise an Entity-aware Attention (EA) module and a Relation-aware Attention (RA) module to conduct fine-grained cross-modal feature matching. Facilitated by co-attention operation, our EA module matches visual entity features with linguistic entity features while RA module matches pair-wise visual relation features with linguistic relation features, respectively. We further integrate EA and RA modules into an Entity-and-Relation aware Contextual Block (ERCB) and stack several ERCBs to form our TransRefer3D for hierarchical multimodal context modeling. Extensive experiments on both Nr3D and Sr3D datasets demonstrate that our proposed model significantly outperforms existing approaches by up to 10.6% and claims the new state-of-the-art performance. To the best of our knowledge, this is the first work investigating Transformer architecture for fine-grained 3D visual grounding task.
引用
收藏
页码:2344 / 2352
页数:9
相关论文
共 50 条
  • [41] The efficient waste-based fine-grained fibre concretes for 3D printing
    Aldabergenova, Gaziza
    Jexembayeva, Assel
    Konkanov, Marat
    Kirgizbayev, Akpan
    Aruova, Lyazat
    Zhaksylykova, Leila
    STRUCTURES, 2024, 69
  • [42] FGPNet: A weakly supervised fine-grained 3D point clouds classification network
    Shao, Huihui
    Bai, Jing
    Wu, Rusong
    Jiang, Jinzhe
    Liang, Hongbo
    PATTERN RECOGNITION, 2023, 139
  • [43] FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
    Li, Ronghui
    Zhao, Junfan
    Zhang, Yachao
    Su, Mingyang
    Ren, Zeping
    Zhang, Han
    Tang, Yansong
    Li, Xiu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10200 - 10209
  • [44] A Novel 3D Intelligent Cluster Method for Malicious Traffic Fine-Grained Classification
    Zhao, Baokang
    Lin, Murao
    Wei, Ziling
    Xin, Qin
    Su, Jinshu
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT I, 2022, 13155 : 385 - 401
  • [45] Architectural exploration of a fine-grained 3D cache for high performance in a manycore context
    Guthmuller, Eric
    Miro-Panades, Ivan
    Greiner, Alain
    2013 IFIP/IEEE 21ST INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2013, : 302 - 307
  • [46] An Energy-Efficient 3D CMP Design with Fine-Grained Voltage Scaling
    Zhao, Jishen
    Dong, Xiangyu
    Xie, Yuan
    2011 DESIGN, AUTOMATION & TEST IN EUROPE (DATE), 2011, : 539 - 542
  • [47] Spatially aligned sketch-based fine-grained 3D shape retrieval
    Xu Chen
    Zheng Zhong
    Dongbo Zhou
    Neural Computing and Applications, 2023, 35 : 16607 - 16617
  • [48] 3D synthesis of man-made objects based on fine-grained parts
    Gonzalez, Diego
    van Kaick, Oliver
    COMPUTERS & GRAPHICS-UK, 2018, 74 : 150 - 160
  • [49] Spatially aligned sketch-based fine-grained 3D shape retrieval
    Chen, Xu
    Zhong, Zheng
    Zhou, Dongbo
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (22): : 16607 - 16617
  • [50] FGFusion: Fine-Grained Lidar-Camera Fusion for 3D Object Detection
    Yin, Zixuan
    Sun, Han
    Liu, Ningzhong
    Zhou, Huiyu
    Shen, Jiaquan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 505 - 517