VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

被引:8
|
作者
Wang, Ziqin [1 ]
Cheng, Bowen [1 ]
Zhao, Lichen [1 ]
Xu, Dong [2 ]
Tang, Yang [3 ]
Sheng, Lu [1 ]
机构
[1] Beihang Univ, Sch Software, Beijing, Peoples R China
[2] Univ Hong Kong, Hong Kong, Peoples R China
[3] East China Univ Sci & Technol, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.02065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of 3D semantic scene graph (3DSSG) prediction in the point cloud is challenging since (1) the 3D point cloud only captures geometric structures with limited semantics compared to 2D images, and (2) long-tailed relation distribution inherently hinders the learning of unbiased prediction. Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations. The key idea is to train a powerful multi-modal oracle model to assist the 3D model. This oracle learns reliable structural representations based on semantics from vision, language, and 3D geometry, and its benefits can be heterogeneously passed to the 3D model during the training stage. By effectively utilizing visual-linguistic semantics in training, our VL-SAT can significantly boost common 3DSSG prediction models, such as SGFN and SGG(point), only with 3D inputs in the inference stage, especially when dealing with tail relation triplets. Comprehensive evaluations and ablation studies on the 3DSSG dataset have validated the effectiveness of the proposed scheme. Code is available at https://github.com/wz7in/CVPR2023-VLSAT.
引用
收藏
页码:21560 / 21569
页数:10
相关论文
共 34 条
  • [21] DGAT-net: Dynamic Graph Attention for 3D Point Cloud Semantic Segmentation
    Miao, Yujie
    Yi, Xiaodong
    Guan, Naiyang
    Lu, Hailun
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024, 2024, 14872 : 253 - 265
  • [22] Bottleneck Identification to Semantic Segmentation of Industrial 3D Point Cloud Scene via Deep Learning
    Cazorla, Romain
    Poinel, Line
    Papadakis, Panagiotis
    Buche, Cedric
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4877 - 4878
  • [23] An Interactive Visual Analytic Tool for Semantic Classification of 3D Urban LiDAR Point Cloud
    Kumari, Beena
    Sreevalsan-Nair, Jaya
    23RD ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2015), 2015,
  • [24] Nomadic Point Cloud Calibration A Visual Calibration Method for a High Resolution 3D Scene Reconstruction
    Heckes, Juergen
    Arles, Adrien
    Klein, Alexander
    Ciba, Dorota
    Giefing, Gerd-Juergen
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 483 - 488
  • [25] 3D point cloud semantic segmentation toward large-scale unstructured agricultural scene classification
    Chen, Yi
    Xiong, Yingjun
    Zhang, Baohua
    Zhou, Jun
    Zhang, Qian
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2021, 190
  • [26] 3D point cloud semantic segmentation based on visual guidance and feature enhancement3D point cloud semantic segmentation...S. Chen et al.
    Sitong Chen
    Yucheng Shu
    Lihong Qiao
    Zhengyang Wu
    Jing Ling
    Jiang Wu
    Weisheng Li
    Multimedia Systems, 2025, 31 (3)
  • [27] UGN: U-shape network based on graph convolution for 3D point cloud semantic segmentation
    Guan, Shaojie
    Li, Xingwei
    Jin, Jiating
    Li, Xinlong
    Ge, Yizhi
    2020 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING AND ARTIFICIAL INTELLIGENCE, 2020, 11584
  • [28] Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
    Feng, Mingtao
    Li, Zhen
    Li, Qi
    Zhang, Liang
    Zhang, XiangDong
    Zhu, Guangming
    Zhang, Hui
    Wang, Yaonan
    Mian, Ajmal
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3702 - 3711
  • [29] Lang3DSG: Language-based contrastive pre-training for 3D Scene Graph prediction
    Koch, Sebastian
    Hermosilla, Pedro
    Vaskevicius, Narunas
    Colosi, Mirco
    Ropinski, Timo
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1037 - 1047
  • [30] Robot-assisted mobile scanning for automated 3D reconstruction and point cloud semantic segmentation of building interiors
    Hu, Difeng
    Gan, Vincent J. L.
    Yin, Chao
    AUTOMATION IN CONSTRUCTION, 2023, 152