Dual-graph hierarchical interaction network for referring image segmentation

被引：1

作者：

Shi, Zhaofeng ^{[1
]}

Wu, Qingbo ^{[1
]}

Li, Hongliang ^{[1
]}

Meng, Fanman ^{[1
]}

Ngan, King Ngi ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

来源：

DISPLAYS | 2023年 / 80卷

基金：

中国国家自然科学基金;

关键词：

Referring image segmentation; Graph reasoning; Hierarchical interaction; BLIND QUALITY ASSESSMENT; MOVEMENT; HEAD;

D O I：

10.1016/j.displa.2023.102575

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Referring Image Segmentation (RIS) aims to extract the object or stuff from an image according to the given natural language expression. As a representative multi-modal reasoning task, the main challenge of RIS lies in accurately understanding and aligning two types of heterogeneous data (i.e. image and text). Existing methods typically conduct this task via inexplicit cross-modal fusion toward the visual and linguistic features, which are separately extracted from different encoders and hard to capture accurate image-text alignment information due to their distinct latent representation structures. In this paper, we propose a Dual-Graph Hierarchical Interaction Network (DGHIN) to facilitate the explicit and comprehensive alignment between the image and text data. Firstly, two graphs are separately built for the initial visual and linguistic features extracted with different pre-trained encoders. By means of graph reasoning, we obtain a unified representation structure for different modalities to capture the intra-modal entities and their contexts, where each projected node incorporates the long-range dependencies into the latent representation. Then, the Hierarchical Interaction Module (HIM) is applied to the visual and linguistic graphs to extract comprehensive inter-modal correlations from the entity level and graph level, which not only capture the corresponding keywords and visual patches but also draws the whole sentence closer to the image region with the consistent context in the latent space. Extensive experiments on RefCOCO, RefCOCO+, G-Ref, and ReferIt demonstrate that the proposed DGHIN outperforms many state-of-the-art methods. Code is available at https://github.com/ZhaofengSHI/referring-DGHIN.

引用

页数：12

共 50 条

[21] Global Selection and Local Attention Network for Referring Image Segmentation
Ding, Haixin
Zhang, Shengchuan
Cao, Liujuan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 284 - 295
[22] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
Feng, Guang
Hu, Zhiwei
Zhang, Lihe
Sun, Jiayu
Lu, Huchuan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258
[23] Building irregular pyramids by dual-graph contraction
Kropatsch, WG
IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1995, 142 (06): : 366 - 374
[24] Knowledge-enhanced model with dual-graph interaction for confusing legal charge prediction
Bi, Sheng
Ali, Zafar
Wu, Tianxing
Qi, Guilin
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[25] Temporal attention aware dual-graph convolution network for air traffic flow prediction
Cai, Kaiquan
Shen, Zhiqi
Luo, Xiaoyan
Li, Yue
JOURNAL OF AIR TRANSPORT MANAGEMENT, 2023, 106
[26] Local dual-graph discriminant classifier for binary classification
Zheng, Xiaohan
Zhang, Li
Yan, Leilei
NEUROCOMPUTING, 2024, 581
[27] Dual-Graph Attention Convolution Network for 3-D Point Cloud Classification
Huang, Chang-Qin
Jiang, Fan
Huang, Qiong-Hao
Wang, Xi-Zhe
Han, Zhong-Mei
Huang, Wei-Yu
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4813 - 4825
[28] Hierarchical Context Network for Airborne Image Segmentation
Zhou, Feng
Hang, Renlong
Shuai, Hui
Liu, Qingshan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[29] Progressive Dynamic Graph Network for Image Segmentation
Wang, Zijiao
Xu, Chunyan
Zhou, Chuanwei
Cui, Zhen
Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2024, 49 (07): : 1212 - 1223
[30] Multiscale deep feature selection fusion network for referring image segmentation
Xianwen Dai
Jiacheng Lin
Ke Nai
Qingpeng Li
Zhiyong Li
Multimedia Tools and Applications, 2024, 83 : 36287 - 36305

← 1 2 3 4 5 →