Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network

被引：0

作者：

Wang, Haowei ^{[1
]}

Ji, Jiayi ^{[1
]}

Zhou, Yiyi ^{[1
,2
]}

Wu, Yongjian ^{[4
]}

Sun, Xiaoshuai ^{[1
,2
,3
]}

机构：

[1] Xiamen Univ, Sch Informat, Dept Artificial Intelligence, Media Analyt & Comp Lab, Xiamen 361005, Peoples R China

[2] Xiamen Univ, Inst Artificial Intelligence, Xiamen, Peoples R China

[3] Xiamen Univ, Fujian Engn Res Ctr Trusted Artificial Intelligen, Fujian, Peoples R China

[4] Tencent Youtu Lab, Shanghai, Peoples R China

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding task, which locates the target regions of an image corresponding to the text description. Existing approaches for PNG are mainly based on a two-stage paradigm, which is computationally expensive. In this paper, we propose a one-stage network for real-time PNG, termed End-to-End Panoptic Narrative Grounding network (EPNG), which directly generates masks for referents. Specifically, we propose two innovative designs, i.e., Locality-Perceptive Attention (LPA) and a bidirectional Semantic Alignment Loss (SAL), to properly handle the many-to-many relationship between textual expressions and visual objects. LPA embeds the local spatial priors into attention modeling, i.e., a pixel may belong to multiple masks at different scales, thereby improving segmentation. To help understand the complex semantic relationships, SAL proposes a bidirectional contrastive objective to regularize the semantic consistency inter modalities. Extensive experiments on the PNG benchmark dataset demonstrate the effectiveness and efficiency of our method. Compared to the single-stage baseline, our method achieves a significant improvement of up to 9.4% accuracy. More importantly, our EPNG is 10 times faster than the two-stage model. Meanwhile, the generalization ability of EPNG is also validated by zero-shot experiments on other grounding tasks. The source codes and trained models for all our experiments are publicly available at https://github.com/Mr-Neko/EPNG.git.

引用

页码：2528 / 2536

页数：9

共 50 条

[31] Efficient end-to-end transport of soft real-time applications
Antoniou, Z
Stavrakakis, I
NETWORKING 2000, 2000, 1815 : 470 - 482
[32] End-to-end absolute differentiated services for real-time traffic
Yang, JM
Huang, CC
Performance Challenges for Efficient Next Generation Networks, Vols 6A-6C, 2005, 6A-6C : 1425 - 1434
[33] END-TO-END NEURAL SPEECH CODING FOR REAL-TIME COMMUNICATIONS
Jiang, Xue
Peng, Xiulian
Zheng, Chengyu
Xue, Huaying
Zhang, Yuan
Lu, Yan
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 866 - 870
[34] End-to-end system for real-time bidirectional holographic communication
Sinharoy, Indranil
Budagavi, Madhukar
Faramarz, Esmaeil
Ni, Saifeng
Sehgal, Abhishek
REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2024, 2024, 13034
[35] End-to-End Real-Time Vanishing Point Detection with Transformer
Tong, Xin
Peng, Shi
Guo, Yufei
Huang, Xuhui
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5243 - 5251
[36] Towards end-to-end network resilience
Vlacheas, Panagiotis
Stavroulaki, Vera
Demestichas, Panagiotis
Cadzow, Scott
Ikonomou, Demosthenes
Gorniak, Slawomir
INTERNATIONAL JOURNAL OF CRITICAL INFRASTRUCTURE PROTECTION, 2013, 6 (3-4) : 159 - 178
[37] End-to-end multitask Siamese network with residual hierarchical attention for real-time object tracking
Huang, Wenhui
Gu, Jason
Ma, Xin
Li, Yibin
APPLIED INTELLIGENCE, 2020, 50 (06) : 1908 - 1921
[38] End-to-End Feature Pyramid Network for Real-Time Multi-Person Pose Estimation
Luo, Dingli
Du, Songlin
Ikenaga, Takeshi
PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2019,
[39] SP-YOLO: an end-to-end lightweight network for real-time human pose estimation
Yuting Zhang
Zongyan Wang
Menglong Li
Pei Gao
Signal, Image and Video Processing, 2024, 18 : 863 - 876
[40] An End-to-end Delay Calculation Method for Airworthiness Verification on Real-time AFDX Priority Network
Song Dong
Zeng Xingxing
Ding Lina
PROCEEDINGS OF 2009 INTERNATIONAL SYMPOSIUM ON AIRCRAFT AIRWORTHINESS, 2009, : 321 - 324

← 1 2 3 4 5 →