A Novel End-to-End Transformer for Scene Graph Generation

被引:0
|
作者
Ren, Chengkai [1 ]
Liu, Xiuhua [2 ]
Cao, Mengyuan [2 ]
Zhang, Jian [1 ]
Wang, Hongwei [1 ]
机构
[1] Zhejiang Univ, ZJU UIUC Inst, Haining, Peoples R China
[2] Intelligent Sci & Technol Acad CASIC, Beijing, Peoples R China
关键词
Transformer; Scene Graph; Scene Understanding; End-to-end; Visual Relationship Detection;
D O I
10.1109/IJCNN54540.2023.10191798
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An image usually contains not only visual information but also higher-level semantic information. Nevertheless, previous computer vision algorithms, such as target detection and image classification, use only the visual features of the image alone. Recently, the explosion of scene graphs in computer vision has led to the challenge of generating structured scene graphs with rich semantic information. This paper proposes a one-stage query-based end-to-end Transformer model and generates scene graphs using the Hungarian matching algorithm. We develop an anti-bias reasoner module to reduce the impact of the unbalanced data distribution. Time-division training strategy is proposed to improve model training efficiency and speed up model convergence while improving model training performance. Experiments on the large-scale dataset Visual Genome were conducted in order to confirm the validity of our method. Compared with the existing state-of-the-art method, our method guarantees inference speed while maintaining acceptable performance and is more suitable for tasks with high real-time performance. Our work demonstrates that the one-stage method has great potential for exploration in scene graph generation.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] SGTR: End-to-end Scene Graph Generation with Transformer
    Li, Rongjie
    Zhang, Songyang
    He, Xuming
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19464 - 19474
  • [2] SGTR plus : End-to-End Scene Graph Generation With Transformer
    Li, Rongjie
    Zhang, Songyang
    He, Xuming
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2191 - 2205
  • [3] End-to-End Video Scene Graph Generation With Temporal Propagation Transformer
    Zhang, Yong
    Pan, Yingwei
    Yao, Ting
    Huang, Rui
    Mei, Tao
    Chen, Chang-Wen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1613 - 1625
  • [4] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    [J]. PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [5] Learning Scene-Pedestrian Graph for End-to-End Person Search
    Song, Zifan
    Zhao, Cairong
    Hu, Guosheng
    Miao, Duoqian
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (02) : 2979 - 2990
  • [6] Knowledge graph assisted end-to-end medical dialog generation
    Varshney, Deeksha
    Zafar, Aizan
    Behera, Niranshu Kumar
    Ekbal, Asif
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 139
  • [7] End-to-End Optimization of Scene Layout
    Luo, Andrew
    Zhang, Zhoutong
    Wu, Jiajun
    Tenenbaum, Joshua B.
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3753 - 3762
  • [8] End-to-End Scene Text Recognition
    Wang, Kai
    Babenko, Boris
    Belongie, Serge
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
  • [9] Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
    Dhamo, Helisa
    Manhardt, Fabian
    Navab, Nassir
    Tombari, Federico
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16332 - 16341
  • [10] End-to-end Learning for Graph Decomposition
    Song, Jie
    Andres, Bjoern
    Black, Michael J.
    Hilliges, Otmar
    Tang, Siyu
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10092 - 10101