Learning to transfer focus of graph neural network for scene graph parsing

被引:17
|
作者
Jiang, Junjie [1 ]
He, Zaixing [1 ,2 ]
Zhang, Shuyou [2 ]
Zhao, Xinyue [2 ]
Tan, Jianrong [2 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, State Key Lab CAD & CG, Hangzhou 310027, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic relationship; Graphical focus; Scene graph; Class imbalance; Image understanding;
D O I
10.1016/j.patcog.2020.107707
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene graph parsing has become a new challenge in the field of image understanding and pattern recognition in recent years. It captures objects and their relationships, and provides a structured representation of the visual scene. Among the three types of high-level relationships of scene graphs, semantic relationships, which contain the global understanding of the scene, are the core and the most valuable, while geometric and possessive relationships contain local and limited information. However, semantic relationships have the characteristics of multiple types and fewer instances, leading to a low recognition rate of most semantic relationships by existing detectors. To address this issue, this paper proposes a new architecture, the graphical focal network, which uses a decision-level global detector to capture the dependencies between object and relationship local detectors. We construct a graphical focal loss, which overcomes the lack of semantic relationship instances by adjusting the proportion of relationship loss based on the degree of relationship rarity and learning difficulty, and improves the stability of key object recognition by adjusting the proportion of object loss based on the degree of node connectivity and the value of neighborhood relationships. The proposed relative depth encoding module and regional layout encoding module, respectively, introduce relative depth information and more effective geometric layout information between objects, thereby further improving the performance. Experiments using the Visual Genome benchmark show that our method outperforms the most advanced competitors in two types of performance metrics. For semantic types, the recognition rate of our method is 2.0 times that of the baseline. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Re:PolyWorld - A Graph Neural Network for Polygonal Scene Parsing
    Zorzi, Stefano
    Fraundorfer, Friedrich
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16716 - 16725
  • [2] Neural Motifs: Scene Graph Parsing with Global Context
    Zellers, Rowan
    Yatskar, Mark
    Thomson, Sam
    Choi, Yejin
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5831 - 5840
  • [3] Reverse Graph Learning for Graph Neural Network
    Peng, Liang
    Hu, Rongyao
    Kong, Fei
    Gan, Jiangzhang
    Mo, Yujie
    Shi, Xiaoshuang
    Zhu, Xiaofeng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4530 - 4541
  • [4] Attentive Gated Graph Neural Network for Image Scene Graph Generation
    Li, Shuohao
    Tang, Min
    Zhang, Jun
    Jiang, Lincheng
    [J]. SYMMETRY-BASEL, 2020, 12 (04):
  • [5] Structured Neural Motifs: Scene Graph Parsing via Enhanced Context
    Li, Yiming
    Yang, Xiaoshan
    Xu, Changsheng
    [J]. MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 175 - 188
  • [6] Graphical Contrastive Losses for Scene Graph Parsing
    Zhang, Ji
    Shih, Kevin J.
    Elgammal, Ahmed
    Tao, Andrew
    Catanzaro, Bryan
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11527 - 11535
  • [7] Graph-Based Dependency Parsing with Recursive Neural Network
    Huang, Pingping
    Chang, Baobao
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 227 - 239
  • [8] Semantic Graph Parsing with Recurrent Neural Network DAG Grammars
    Fancellu, Federico
    Gilroy, Sorcha
    Lopez, Adam
    Lapata, Mirella
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2769 - 2778
  • [9] Graph Alignment Neural Network Model With Graph to Sequence Learning
    Ning, Nianwen
    Wu, Bin
    Ren, Haoqing
    Li, Qiuyue
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4693 - 4706
  • [10] Geodesic Graph Neural Network for Efficient Graph Representation Learning
    Kong, Lecheng
    Chen, Yixin
    Zhang, Muhan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,