A Novel End-to-End Transformer for Scene Graph Generation

被引:0
|
作者
Ren, Chengkai [1 ]
Liu, Xiuhua [2 ]
Cao, Mengyuan [2 ]
Zhang, Jian [1 ]
Wang, Hongwei [1 ]
机构
[1] Zhejiang Univ, ZJU UIUC Inst, Haining, Peoples R China
[2] Intelligent Sci & Technol Acad CASIC, Beijing, Peoples R China
关键词
Transformer; Scene Graph; Scene Understanding; End-to-end; Visual Relationship Detection;
D O I
10.1109/IJCNN54540.2023.10191798
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An image usually contains not only visual information but also higher-level semantic information. Nevertheless, previous computer vision algorithms, such as target detection and image classification, use only the visual features of the image alone. Recently, the explosion of scene graphs in computer vision has led to the challenge of generating structured scene graphs with rich semantic information. This paper proposes a one-stage query-based end-to-end Transformer model and generates scene graphs using the Hungarian matching algorithm. We develop an anti-bias reasoner module to reduce the impact of the unbalanced data distribution. Time-division training strategy is proposed to improve model training efficiency and speed up model convergence while improving model training performance. Experiments on the large-scale dataset Visual Genome were conducted in order to confirm the validity of our method. Compared with the existing state-of-the-art method, our method guarantees inference speed while maintaining acceptable performance and is more suitable for tasks with high real-time performance. Our work demonstrates that the one-stage method has great potential for exploration in scene graph generation.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] An End-to-End Scene Text Recognition for Bilingual Text
    Albalawi, Bayan M.
    Jamal, Amani T.
    Al Khuzayem, Lama A.
    Alsaedi, Olaa A.
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (09)
  • [32] End-to-End Negation Resolution as Graph Parsing
    Kurtz, Robin
    Oepen, Stephan
    Kuhlmann, Marco
    [J]. 16TH INTERNATIONAL CONFERENCE ON PARSING TECHNOLOGIES AND IWPT 2020 SHARED TASK ON PARSING INTO ENHANCED UNIVERSAL DEPENDENCIES, 2020, : 14 - 24
  • [33] KGEL: A novel end-to-end embedding learning framework for knowledge graph completion
    Zeb, Adnan
    Ul Haq, Anwar
    Zhang, Defu
    Chen, Junde
    Gong, Zhiguo
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167
  • [34] End-to-end generation of structural topology for complex architectural layouts with graph neural networks
    Zhang, Chong
    Tao, Mu-Xuan
    Wang, Chen
    Fan, Jian-Sheng
    [J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2024, 39 (05) : 756 - 775
  • [35] Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
    Ahmad, Hawraz A.
    Rashid, Tarik A.
    [J]. ALGORITHMS, 2024, 17 (07)
  • [36] End-to-End Dense Video Captioning with Masked Transformer
    Zhou, Luowei
    Zhou, Yingbo
    Corso, Jason J.
    Socher, Richard
    Xiong, Caiming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8739 - 8748
  • [37] Transformer Based End-to-End Mispronunciation Detection and Diagnosis
    Wu, Minglin
    Li, Kun
    Leung, Wai-Kim
    Meng, Helen
    [J]. INTERSPEECH 2021, 2021, : 3954 - 3958
  • [38] SDformer: Efficient End-to-End Transformer for Depth Completion
    Qian, Jian
    Sun, Miao
    Lee, Ashley
    Li, Jie
    Zhuo, Shenglong
    Chiang, Patrick Yin
    [J]. 2022 INTERNATIONAL CONFERENCE ON INDUSTRIAL AUTOMATION, ROBOTICS AND CONTROL ENGINEERING, IARCE, 2022, : 56 - 61
  • [39] SRDD: a lightweight end-to-end object detection with transformer
    Zhu, Yuan
    Xia, Qingyuan
    Jin, Wen
    [J]. CONNECTION SCIENCE, 2022, 34 (01) : 2448 - 2465
  • [40] Dynamic deformable transformer for end-to-end face alignment
    Han, Liming
    Yang, Chi
    Li, Qing
    Yao, Bin
    Jiao, Zixian
    Xie, Qianyang
    [J]. IET COMPUTER VISION, 2023, 17 (08) : 948 - 961