TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding

被引：35

作者：

He, Dailan ^{[1
]}

Zhao, Yusheng ^{[1
]}

Luo, Junyu ^{[1
]}

Hui, Tianrui ^{[2
]}

Huang, Shaofei ^{[2
]}

Zhang, Aixi ^{[3
]}

Liu, Si ^{[4
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[3] Alibaba Grp, Beijing, Peoples R China

[4] Inst Artificial Intelligence, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

3D visual grounding; transformer; entity attention; relation attention;

D O I：

10.1145/3474085.3475397

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently proposed fine-grained 3D visual grounding is an essential and challenging task, whose goal is to identify the 3D object referred by a natural language sentence from other distractive objects of the same category. Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents. In this work, we exploit Transformer for its natural suitability on permutation-invariant 3D point clouds data and propose a TransRefer3D network to extract entity-andrelation aware multimodal context among objects for more discriminative feature learning. Concretely, we devise an Entity-aware Attention (EA) module and a Relation-aware Attention (RA) module to conduct fine-grained cross-modal feature matching. Facilitated by co-attention operation, our EA module matches visual entity features with linguistic entity features while RA module matches pair-wise visual relation features with linguistic relation features, respectively. We further integrate EA and RA modules into an Entity-and-Relation aware Contextual Block (ERCB) and stack several ERCBs to form our TransRefer3D for hierarchical multimodal context modeling. Extensive experiments on both Nr3D and Sr3D datasets demonstrate that our proposed model significantly outperforms existing approaches by up to 10.6% and claims the new state-of-the-art performance. To the best of our knowledge, this is the first work investigating Transformer architecture for fine-grained 3D visual grounding task.

引用

页码：2344 / 2352

页数：9

共 50 条

[41] The efficient waste-based fine-grained fibre concretes for 3D printing
Aldabergenova, Gaziza
Jexembayeva, Assel
Konkanov, Marat
Kirgizbayev, Akpan
Aruova, Lyazat
Zhaksylykova, Leila
STRUCTURES, 2024, 69
[42] FGPNet: A weakly supervised fine-grained 3D point clouds classification network
Shao, Huihui
Bai, Jing
Wu, Rusong
Jiang, Jinzhe
Liang, Hongbo
PATTERN RECOGNITION, 2023, 139
[43] FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
Li, Ronghui
Zhao, Junfan
Zhang, Yachao
Su, Mingyang
Ren, Zeping
Zhang, Han
Tang, Yansong
Li, Xiu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10200 - 10209
[44] A Novel 3D Intelligent Cluster Method for Malicious Traffic Fine-Grained Classification
Zhao, Baokang
Lin, Murao
Wei, Ziling
Xin, Qin
Su, Jinshu
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT I, 2022, 13155 : 385 - 401
[45] Architectural exploration of a fine-grained 3D cache for high performance in a manycore context
Guthmuller, Eric
Miro-Panades, Ivan
Greiner, Alain
2013 IFIP/IEEE 21ST INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2013, : 302 - 307
[46] An Energy-Efficient 3D CMP Design with Fine-Grained Voltage Scaling
Zhao, Jishen
Dong, Xiangyu
Xie, Yuan
2011 DESIGN, AUTOMATION & TEST IN EUROPE (DATE), 2011, : 539 - 542
[47] Spatially aligned sketch-based fine-grained 3D shape retrieval
Xu Chen
Zheng Zhong
Dongbo Zhou
Neural Computing and Applications, 2023, 35 : 16607 - 16617
[48] 3D synthesis of man-made objects based on fine-grained parts
Gonzalez, Diego
van Kaick, Oliver
COMPUTERS & GRAPHICS-UK, 2018, 74 : 150 - 160
[49] Spatially aligned sketch-based fine-grained 3D shape retrieval
Chen, Xu
Zhong, Zheng
Zhou, Dongbo
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (22): : 16607 - 16617
[50] FGFusion: Fine-Grained Lidar-Camera Fusion for 3D Object Detection
Yin, Zixuan
Sun, Han
Liu, Ningzhong
Zhou, Huiyu
Shen, Jiaquan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 505 - 517

← 1 2 3 4 5 →