TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding

被引:35
|
作者
He, Dailan [1 ]
Zhao, Yusheng [1 ]
Luo, Junyu [1 ]
Hui, Tianrui [2 ]
Huang, Shaofei [2 ]
Zhang, Aixi [3 ]
Liu, Si [4 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[3] Alibaba Grp, Beijing, Peoples R China
[4] Inst Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
3D visual grounding; transformer; entity attention; relation attention;
D O I
10.1145/3474085.3475397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently proposed fine-grained 3D visual grounding is an essential and challenging task, whose goal is to identify the 3D object referred by a natural language sentence from other distractive objects of the same category. Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents. In this work, we exploit Transformer for its natural suitability on permutation-invariant 3D point clouds data and propose a TransRefer3D network to extract entity-andrelation aware multimodal context among objects for more discriminative feature learning. Concretely, we devise an Entity-aware Attention (EA) module and a Relation-aware Attention (RA) module to conduct fine-grained cross-modal feature matching. Facilitated by co-attention operation, our EA module matches visual entity features with linguistic entity features while RA module matches pair-wise visual relation features with linguistic relation features, respectively. We further integrate EA and RA modules into an Entity-and-Relation aware Contextual Block (ERCB) and stack several ERCBs to form our TransRefer3D for hierarchical multimodal context modeling. Extensive experiments on both Nr3D and Sr3D datasets demonstrate that our proposed model significantly outperforms existing approaches by up to 10.6% and claims the new state-of-the-art performance. To the best of our knowledge, this is the first work investigating Transformer architecture for fine-grained 3D visual grounding task.
引用
收藏
页码:2344 / 2352
页数:9
相关论文
共 50 条
  • [31] 3D MEASUREMENT OF FINE-GRAINED RIMS IN CM MURCHISON USING XCT
    Hanna, R. D.
    Ketcham, R. A.
    METEORITICS & PLANETARY SCIENCE, 2015, 50
  • [32] Toward Fine-Grained Sketch-Based 3D Shape Retrieval
    Qi, Anran
    Gryaditskaya, Yulia
    Song, Jifei
    Yang, Yongxin
    Qi, Yonggang
    Hospedales, Timothy M.
    Xiang, Tao
    Song, Yi-Zhe
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (30) : 8595 - 8606
  • [33] A fine-grained orthodontics segmentation model for 3D intraoral scan data
    Li, Juncheng
    Cheng, Bodong
    Niu, Najun
    Gao, Guangwei
    Ying, Shihui
    Shi, Jun
    Zeng, Tieyong
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 168
  • [34] DCNet: exploring fine-grained vision classification for 3D point clouds
    Rusong Wu
    Jing Bai
    Wenjing Li
    Jinzhe Jiang
    The Visual Computer, 2024, 40 (2) : 781 - 797
  • [35] TextANIMAR: Text-based 3D animal fine-grained retrieval
    Le, Trung-Nghia
    Nguyen, Tam, V
    Le, Minh-Quan
    Nguyen, Trong-Thuan
    Huynh, Viet-Tham
    Do, Trong-Le
    Le, Khanh-Duy
    Tran, Mai-Khiem
    Hoang-Xuan, Nhat
    Nguyen-Ho, Thang-Long
    Nguyen, Vinh-Tiep
    Diep, Tuong-Nghiem
    Ho, Khanh-Duy
    Nguyen, Xuan-Hieu
    Tran, Thien-Phuc
    Yang, Tuan-Anh
    Tran, Kim-Phat
    Hoang, Nhu-Vinh
    Nguyen, Minh-Quang
    Nguyen, E-Ro
    Nguyen-Nhat, Minh-Khoi
    To, Tuan-An
    Huynh-Le, Trung-Truc
    Nguyen, Nham-Tan
    Luong, Hoang-Chau
    Phong, Truong Hoai
    Le-Pham, Nhat-Quynh
    Pham, Huu-Phuc
    Hoang, Trong-Vu
    Nguyen, Quang-Binh
    Nguyen, Hai-Dang
    Sugimoto, Akihiro
    Tran, Minh-Triet
    COMPUTERS & GRAPHICS-UK, 2023, 116 : 162 - 172
  • [36] Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
    Chang, Haonan
    Boyalakuntla, Kowndinya
    Lu, Shiyang
    Cai, Siwei
    Jing, Eric Pu
    Keskar, Shreesh
    Geng, Shijie
    Abbas, Adeeb
    Zhou, Lifeng
    Bekris, Kostas
    Boularias, Abdeslam
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [37] BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition
    Sochor, Jakub
    Herout, Adam
    Havel, Jiri
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3006 - 3015
  • [38] 3D Convolutional Networks for Fully Automatic Fine-Grained Whole Heart Partition
    Yang, Xin
    Bian, Cheng
    Yu, Lequan
    Ni, Dong
    Heng, Pheng-Ann
    STATISTICAL ATLASES AND COMPUTATIONAL MODELS OF THE HEART: ACDC AND MMWHS CHALLENGES, 2018, 10663 : 181 - 189
  • [39] Fine-Grained 3D Model Classification Based on Deep Ensemble and Detail Awareness
    Bai J.
    Ji H.
    Shao H.
    Wu R.
    Qin F.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (10): : 1580 - 1589
  • [40] Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention
    Liu, Xinhai
    Han, Zhizhong
    Liu, Yu-Shen
    Zwicker, Matthias
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1744 - 1758