Scene captioning with deep fusion of images and point clouds

被引:0
|
作者
Yu, Qiang [1 ,3 ]
Zhang, Chunxia [4 ]
Weng, Lubin [1 ]
Xiang, Shiming [2 ,3 ]
Pan, Chunhong [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Res Ctr Aerosp Informat, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[4] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene captioning; Point cloud; Deep fusion;
D O I
10.1016/j.patrec.2022.04.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the fusion of images and point clouds has received appreciable attentions in various fields, for example, autonomous driving, whose advantage over single-modal vision has been verified. However, it has not been extensively exploited in the scene captioning task. In this paper, a novel scene captioning framework with deep fusion of images and point clouds based on region correlation and attention is proposed to improve performances of captioning models. In our model, a symmetrical processing pipeline is designed for point clouds and images. First, 3D and 2D region features are generated respectively through region proposal generation, proposal fusion, and region pooling modules. Then, a feature fusion module is designed to integrate features according to the region correlation rule and the attention mechanism, which increases the interpretability of the fusion process and results in a sequence of fused visual features. Finally, the fused features are transformed into captions by an attention-based caption generation module. Comprehensive experiments indicate that the performance of our model reaches the state of the art.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:9 / 15
页数:7
相关论文
共 50 条
  • [21] Semantic Segmentation and Reconstruction of Indoor Scene Point Clouds
    Hao, Wen
    Wei, Hainan
    Wang, Yang
    [J]. ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2024, 24 (03) : 3 - 12
  • [22] Scene Flow from Point Clouds with or without Learning
    Pontes, Jhony Kaesemodel
    Hays, James
    Lucey, Simon
    [J]. 2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 261 - 270
  • [23] 3D-Aware Scene Change Captioning From Multiview Images
    Qiu, Yue
    Satoh, Yutaka
    Suzuki, Ryota
    Iwata, Kenji
    Kataoka, Hirokatsu
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (03): : 4743 - 4750
  • [24] Ore Rock Fragmentation Calculation Based on Multi-Modal Fusion of Point Clouds and Images
    Peng, Jianjun
    Cui, Yunhao
    Zhong, Zhidan
    An, Yi
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [25] A Novel Interactive Fusion Method with Images and Point Clouds for 3D Object Detection
    Xu, Kai
    Yang, Zhile
    Xu, Yangjie
    Feng, Liangbing
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (06):
  • [26] Fusion Method of Infrared Images and 3D Point Clouds Based on Cross Markers
    Yelong, Zheng
    Changyong, Li
    Ningning, Xia
    Lingyi, Li
    Guomin, Zhang
    Meirong, Zhao
    [J]. Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2024, 57 (10): : 1090 - 1099
  • [27] Combination of Images and Point Clouds in a Generative Adversarial Network for Upsampling Crack Point Clouds
    Nguyen, Nhung Hong Thi
    Perry, Stuart
    Bone, Don
    Thanh, Ha Le
    Xu, Min
    Nguyen, Thuy Thi
    [J]. IEEE ACCESS, 2022, 10 : 67198 - 67209
  • [28] Deep Segmentation of Point Clouds of Wheat
    Ghahremani, Morteza
    Williams, Kevin
    Corke, Fiona M. K.
    Tiddeman, Bernard
    Liu, Yonghuai
    Doonan, John H.
    [J]. FRONTIERS IN PLANT SCIENCE, 2021, 12
  • [29] Visual Features Fusion for Scene Images Classification
    Gao Hua
    Zhao Chun-xia
    Zhang Hao-feng
    [J]. INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTIST, IMECS 2012, VOL II, 2012, : 912 - 915
  • [30] Cross on Cross Attention: Deep Fusion Transformer for Image Captioning
    Zhang, Jing
    Xie, Yingshuai
    Ding, Weichao
    Wang, Zhe
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4257 - 4268