Joint representation learning for text and 3D point cloud

被引:3
|
作者
Huang, Rui [1 ]
Pan, Xuran [1 ]
Zheng, Henry [1 ]
Jiang, Haojun [1 ]
Xie, Zhifeng [2 ]
Wu, Cheng [1 ]
Song, Shiji [1 ]
Huang, Gao [1 ]
机构
[1] Tsinghua Univ, BNRist, Dept Automat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Beijing 100084, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Point cloud; Multi-modal learning; Representation learning;
D O I
10.1016/j.patcog.2023.110086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in vision-language pre-training (e.g., CLIP) have enabled 2D vision models to benefit from language supervision. However, the joint representation learning of 3D point cloud with text remains under -explored due to challenges in acquiring 3D-Text data pairs. Prior works propose to project point clouds into 2D depth maps and directly use CLIP, while they sacrifice 3D structural information, limiting its applicability. In this paper, we put forward Text4Point, a novel framework to construct language-guided 3D models for dense prediction tasks. Text4Point utilizes 2D images as a bridge to connect the point cloud and language modalities. It follows a pre-training and fine-tuning paradigm. During pre-training, we leverage dense contrastive learning to align the image and point cloud representations using the readily available RGB-D data. Together with the well-aligned image and text features achieved by CLIP, the point cloud features are implicitly aligned with the text embeddings. Further, we propose a Text Querying Module to integrate language information into 3D representation learning by querying text embeddings with point cloud features. For fine-tuning, the model learns 3D representations under informative language guidance without 2D images. Extensive experiments demonstrate consistent improvement on various dense prediction tasks with Text4Point.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Geometric Invariant Representation Learning for 3D Point Cloud
    Li, Zongmin
    Zhang, Yupeng
    Bai, Yun
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1480 - 1485
  • [2] Clustering based Point Cloud Representation Learning for 3D Analysis
    Feng, Tuo
    Wang, Wenguan
    Wang, Xiaohan
    Yang, Yi
    Zheng, Qinghua
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8249 - 8260
  • [3] Feature extraction and representation learning of 3D point cloud data
    Si, Hongying
    Wei, Xianyong
    [J]. IMAGE AND VISION COMPUTING, 2024, 142
  • [4] Masked Structural Point Cloud Modeling to Learning 3D Representation
    Yamada, Ryosuke
    Tadokoro, Ryu
    Qiu, Yue
    Kataoka, Hirokatsu
    Satoh, Yutaka
    [J]. IEEE Access, 2024, 12 : 142291 - 142305
  • [5] GridNet: efficiently learning deep hierarchical representation for 3D point cloud understanding
    WANG Huiqun
    HUANG Di
    WANG Yunhong
    [J]. Frontiers of Computer Science, 2022, 16 (01)
  • [6] EGCT: enhanced graph convolutional transformer for 3D point cloud representation learning
    Chen, Gang
    Wang, Wenju
    Zhou, Haoran
    Wang, Xiaolin
    [J]. VISUAL COMPUTER, 2024,
  • [7] GridNet: efficiently learning deep hierarchical representation for 3D point cloud understanding
    Huiqun Wang
    Di Huang
    Yunhong Wang
    [J]. Frontiers of Computer Science, 2022, 16
  • [8] Representation Learning via Parallel Subset Reconstruction for 3D Point Cloud Generation
    Matsuzaki, Kohei
    Tasaka, Kazuyuki
    [J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 289 - 296
  • [9] GridNet: efficiently learning deep hierarchical representation for 3D point cloud understanding
    Wang, Huiqun
    Huang, Di
    Wang, Yunhong
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (01)
  • [10] REPRESENTATION LEARNING OPTIMIZATION FOR 3D POINT CLOUD QUALITY ASSESSMENT WITHOUT REFERENCE
    Tliba, Marouane
    Chetouani, Aladine
    Valenzise, Giuseppe
    Dufaux, Frederic
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3702 - 3706