Dual-graph hierarchical interaction network for referring image segmentation

被引:1
|
作者
Shi, Zhaofeng [1 ]
Wu, Qingbo [1 ]
Li, Hongliang [1 ]
Meng, Fanman [1 ]
Ngan, King Ngi [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金;
关键词
Referring image segmentation; Graph reasoning; Hierarchical interaction; BLIND QUALITY ASSESSMENT; MOVEMENT; HEAD;
D O I
10.1016/j.displa.2023.102575
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Referring Image Segmentation (RIS) aims to extract the object or stuff from an image according to the given natural language expression. As a representative multi-modal reasoning task, the main challenge of RIS lies in accurately understanding and aligning two types of heterogeneous data (i.e. image and text). Existing methods typically conduct this task via inexplicit cross-modal fusion toward the visual and linguistic features, which are separately extracted from different encoders and hard to capture accurate image-text alignment information due to their distinct latent representation structures. In this paper, we propose a Dual-Graph Hierarchical Interaction Network (DGHIN) to facilitate the explicit and comprehensive alignment between the image and text data. Firstly, two graphs are separately built for the initial visual and linguistic features extracted with different pre-trained encoders. By means of graph reasoning, we obtain a unified representation structure for different modalities to capture the intra-modal entities and their contexts, where each projected node incorporates the long-range dependencies into the latent representation. Then, the Hierarchical Interaction Module (HIM) is applied to the visual and linguistic graphs to extract comprehensive inter-modal correlations from the entity level and graph level, which not only capture the corresponding keywords and visual patches but also draws the whole sentence closer to the image region with the consistent context in the latent space. Extensive experiments on RefCOCO, RefCOCO+, G-Ref, and ReferIt demonstrate that the proposed DGHIN outperforms many state-of-the-art methods. Code is available at https://github.com/ZhaofengSHI/referring-DGHIN.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Global Selection and Local Attention Network for Referring Image Segmentation
    Ding, Haixin
    Zhang, Shengchuan
    Cao, Liujuan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 284 - 295
  • [22] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
    Feng, Guang
    Hu, Zhiwei
    Zhang, Lihe
    Sun, Jiayu
    Lu, Huchuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258
  • [23] Building irregular pyramids by dual-graph contraction
    Kropatsch, WG
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1995, 142 (06): : 366 - 374
  • [24] Knowledge-enhanced model with dual-graph interaction for confusing legal charge prediction
    Bi, Sheng
    Ali, Zafar
    Wu, Tianxing
    Qi, Guilin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [25] Temporal attention aware dual-graph convolution network for air traffic flow prediction
    Cai, Kaiquan
    Shen, Zhiqi
    Luo, Xiaoyan
    Li, Yue
    JOURNAL OF AIR TRANSPORT MANAGEMENT, 2023, 106
  • [26] Local dual-graph discriminant classifier for binary classification
    Zheng, Xiaohan
    Zhang, Li
    Yan, Leilei
    NEUROCOMPUTING, 2024, 581
  • [27] Dual-Graph Attention Convolution Network for 3-D Point Cloud Classification
    Huang, Chang-Qin
    Jiang, Fan
    Huang, Qiong-Hao
    Wang, Xi-Zhe
    Han, Zhong-Mei
    Huang, Wei-Yu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4813 - 4825
  • [28] Hierarchical Context Network for Airborne Image Segmentation
    Zhou, Feng
    Hang, Renlong
    Shuai, Hui
    Liu, Qingshan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [29] Progressive Dynamic Graph Network for Image Segmentation
    Wang, Zijiao
    Xu, Chunyan
    Zhou, Chuanwei
    Cui, Zhen
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2024, 49 (07): : 1212 - 1223
  • [30] Multiscale deep feature selection fusion network for referring image segmentation
    Xianwen Dai
    Jiacheng Lin
    Ke Nai
    Qingpeng Li
    Zhiyong Li
    Multimedia Tools and Applications, 2024, 83 : 36287 - 36305