Scene Graph with 3D Information for Change Captioning

被引:10
|
作者
Liao, Zeming [1 ]
Huang, Qingbao [1 ,2 ,4 ,5 ,6 ]
Liang, Yu [1 ]
Fu, Mingyi [1 ]
Cai, Yi [2 ,4 ]
Li, Qing [3 ]
机构
[1] Guangxi Univ, Sch Elect Engn, Nanning, Guangxi, Peoples R China
[2] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
[3] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[4] MOE China, Key Lab Big Data & Intelligent Robot SCUT, Beijing, Peoples R China
[5] Guangxi Key Lab Multimedia Commun & Network Techn, Nanning, Peoples R China
[6] Guangxi Univ, Inst Artificial Intelligence, Nanning, Peoples R China
基金
中国国家自然科学基金;
关键词
change captioning; scene graph; image difference description;
D O I
10.1145/3474085.3475712
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Change captioning aims to describe the differences in image pairs with natural language. It is an interesting task under-explored with two main challenges: describing the relative position relationship between objects correctly and overcoming the disturbances from viewpoint changes. To address these issues, we propose a threedimensional (3D) information aware Scene Graph based Change Captioning (SGCC) model. We extract the semantic attributes of objects and the 3D information of images (i.e., depths of objects, relative two-dimensional image plane distances, and relative angles between objects) to construct the scene graphs for image pairs, then aggregate the nodes representations with a graph convolutional network. Owing to the relative position relationships between objects and the scene graphs, our model thereby is capable of assisting observers to locate the changed objects quickly and being immune to the viewpoint change to some extent. Extensive experiments show that our SGCC model achieves competitive performance with the state-of-the-art models on the CLEVR-Change and Spot-the-Diff datasets, thus verifying the effectiveness of our proposed model. Codes are available at https://github.com/VISLANG-Lab/SGCC.
引用
收藏
页码:5074 / 5082
页数:9
相关论文
共 50 条
  • [1] Explore Contextual Information for 3D Scene Graph Generation
    Liu, Yuanyuan
    Long, Chengjiang
    Zhang, Zhaoxuan
    Liu, Bokai
    Zhang, Qiang
    Yin, Baocai
    Yang, Xin
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (12) : 5556 - 5568
  • [2] 3D-Aware Scene Change Captioning From Multiview Images
    Qiu, Yue
    Satoh, Yutaka
    Suzuki, Ryota
    Iwata, Kenji
    Kataoka, Hirokatsu
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (03) : 4743 - 4750
  • [3] Scene Graph Masked Variational Autoencoders for 3D Scene Generation
    Xu, Rui
    Hui, Le
    Han, Yuehui
    Qian, Jianjun
    Xie, Jin
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5725 - 5733
  • [4] CAPTIONING TRANSFORMER WITH SCENE GRAPH GUIDING
    Chen, Haishun
    Wang, Ying
    Yang, Xin
    Li, Jie
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2538 - 2542
  • [5] Collaboration on scene graph based 3D data
    Ammon, Lorenz
    Bieri, Hanspeter
    [J]. ADVANCES IN COMPUTER GRAPHICS AND COMPUTER VISION, 2007, 4 : 78 - 90
  • [6] Collaboration on scene graph based 3D data
    Ammon, Lorenz
    Bieri, Hanspeter
    [J]. GRAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS THEORY AND APPLICATIONS, 2006, : 259 - +
  • [7] 3D Scene Graph: A structure for unified semantics, 3D space, and camera
    Armeni, Iro
    He, Zhi-Yang
    Gwak, JunYoung
    Zamir, Amir R.
    Fischer, Martin
    Malik, Jitendra
    Savarese, Silvio
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5663 - 5672
  • [8] 3D scene graph prediction from point clouds
    Wu, Fanfan
    Yan, Feihu
    Shi, Weimin
    Zhou, Zhong
    [J]. Virtual Reality and Intelligent Hardware, 2022, 4 (01): : 76 - 88
  • [9] 3D Scene Graph Generation From Point Clouds
    Wei, Wenwen
    Wei, Ping
    Qin, Jialu
    Liao, Zhimin
    Wang, Shuaijie
    Cheng, Xiang
    Liu, Meiqin
    Zheng, Nanning
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5358 - 5368
  • [10] 3D scene graph prediction from point clouds
    Fanfan WU
    Feihu YAN
    Weimin SHI
    Zhong ZHOU
    [J]. 虚拟现实与智能硬件(中英文), 2022, 4 (01) : 76 - 88