VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

被引:8
|
作者
Wang, Ziqin [1 ]
Cheng, Bowen [1 ]
Zhao, Lichen [1 ]
Xu, Dong [2 ]
Tang, Yang [3 ]
Sheng, Lu [1 ]
机构
[1] Beihang Univ, Sch Software, Beijing, Peoples R China
[2] Univ Hong Kong, Hong Kong, Peoples R China
[3] East China Univ Sci & Technol, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.02065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of 3D semantic scene graph (3DSSG) prediction in the point cloud is challenging since (1) the 3D point cloud only captures geometric structures with limited semantics compared to 2D images, and (2) long-tailed relation distribution inherently hinders the learning of unbiased prediction. Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations. The key idea is to train a powerful multi-modal oracle model to assist the 3D model. This oracle learns reliable structural representations based on semantics from vision, language, and 3D geometry, and its benefits can be heterogeneously passed to the 3D model during the training stage. By effectively utilizing visual-linguistic semantics in training, our VL-SAT can significantly boost common 3DSSG prediction models, such as SGFN and SGG(point), only with 3D inputs in the inference stage, especially when dealing with tail relation triplets. Comprehensive evaluations and ablation studies on the 3DSSG dataset have validated the effectiveness of the proposed scheme. Code is available at https://github.com/wz7in/CVPR2023-VLSAT.
引用
收藏
页码:21560 / 21569
页数:10
相关论文
共 34 条
  • [1] Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-Labeling
    Wang, Xu
    Li, Yifan
    Zhang, Qiudan
    Wu, Wenhui
    Li, Mark Junjie
    Ma, Lin
    Jiang, Jianmin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 11164 - 11175
  • [2] Unbiased 3D Semantic Scene Graph Prediction in Point Cloud Using Deep Learning
    Han, Chaolin
    Li, Hongwei
    Xu, Jian
    Dong, Bing
    Wang, Yalin
    Zhou, Xiaowen
    Zhao, Shan
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [3] SAT: 2D Semantics Assisted Training for 3D Visual Grounding
    Yang, Zhengyuan
    Zhang, Songyang
    Wang, Liwei
    Luo, Jiebo
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1836 - 1846
  • [4] Knowledge-inspired 3D Scene Graph Prediction in Point Cloud
    Zhang, Shoulong
    Li, Shuai
    Hao, Aimin
    Qin, Hong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation
    Lv, Changsheng
    Qi, Mengshi
    Li, Xia
    Yang, Zhengyuan
    Ma, Huadong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4035 - 4043
  • [6] 3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud
    Feng, Mingtao
    Hou, Haoran
    Zhang, Liang
    Wu, Zijie
    Guo, Yulan
    Mian, Ajmal
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9182 - 9191
  • [7] 3D scene graph prediction from point clouds
    Wu F.
    Yan F.
    Shi W.
    Zhou Z.
    Virtual Reality and Intelligent Hardware, 2022, 4 (01): : 76 - 88
  • [8] 3D scene graph prediction from point clouds
    Fanfan WU
    Feihu YAN
    Weimin SHI
    Zhong ZHOU
    虚拟现实与智能硬件(中英文), 2022, 4 (01) : 76 - 88
  • [9] An Efficient Scene Semantic Labeling Approach for 3D Point Cloud
    Wang, Tianyi
    Li, Jian
    An, Xiangjing
    2015 IEEE 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, 2015, : 2115 - 2120
  • [10] Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds
    Ma, Yanni
    Liu, Hao
    Pei, Yun
    Guo, Yulan
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 274 - 291