Textual Context-Aware Dense Captioning With Diverse Words

被引：21

作者：

Shao, Zhuang ^{[1
]}

Han, Jungong ^{[2
]}

Debattista, Kurt ^{[1
]}

Pang, Yanwei ^{[3
,4
]}

机构：

[1] Univ Warwick, Warwick Mfg Grp, Coventry CV4 7AL, England

[2] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, England

[3] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[4] Shanghai Artificial Intelligence Lab, Shanghai 200032, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

Dense Captioning; Enhanced Transformer Dense Captioner; Textual Context Module; Dynamic Vocabulary Frequency Histogram; NETWORKS;

D O I：

10.1109/TMM.2023.3241517

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider visual contextual clues during captioning but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity of vocabulary learned from the dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we propose an end-to-end enhanced dense captioning architecture, namely Enhanced Transformer Dense Captioner (ETDC), which obtains textual context from surrounding regions and dynamically diversifies the vocabulary bank during captioning. Concretely, we first propose the Textual Context Module (TCM), which is integrated into each self-attention layer of the Transformer decoder, to capture the surrounding textual context. Moreover, we take full advantage of the class information of object context and propose a Dynamic Vocabulary Frequency Histogram (DVFH) re-sampling strategy during training to balance words with different frequencies. The proposed method is tested on the standard dense captioning datasets and surpasses the state-of-the-art methods in terms of mean Average Precision (mAP).

引用

页码：8753 / 8766

页数：14

共 50 条

[31] The Context-Aware Browser
Coppola, Paolo
Della Mea, Vincenzo
Di Gaspero, Luca
Menegon, Davide
Mischis, Danny
Mizzaro, Stefano
Scagnetto, Ivan
Vassena, Luca
IEEE INTELLIGENT SYSTEMS, 2010, 25 (01) : 38 - 47
[32] Architecture, Textual Context Description, and Quiz Generation Scheme for the Movie Based Context-Aware Learning System
Hazriani
Nakanishi, Tsuneo
Fukuda, Akira
PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 2410 - 2413
[33] Towards context-aware collaborative filtering by learning context-aware latent representations
Liu, Xin
Zhang, Jiyong
Yan, Chenggang
KNOWLEDGE-BASED SYSTEMS, 2020, 199
[34] Context-aware regulation of context-aware mobile services in pervasive computing environments
Syukur, Evi
Loke, Seng Wai
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 4, 2006, 3983 : 138 - 147
[35] Context-Aware Architecture utilizing Computing with Words and ISO/IEC/IEEE 42010
Das, Asesh
SOUTHEASTCON 2016, 2016,
[36] Context Variability for Context-Aware Systems
Capilla, Rafael
Ortiz, Oscar
Hinchey, Mike
COMPUTER, 2014, 47 (02) : 85 - 87
[37] Context-aware Media Player (CaMP): Developing context-aware applications with Separation of Concerns
Paspallis, Nearchos
Achilleos, Achilleas
Kakousis, Konstantinos
Papadopoulos, George A.
2010 IEEE GLOBECOM WORKSHOPS, 2010, : 1684 - 1689
[38] Dense Captioning with Joint Inference and Visual Context
Yang, Linjie
Tang, Kevin
Yang, Jianchao
Li, Li-Jia
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1978 - 1987
[39] Context-Aware Sentiment Classification
Kasthuriarachchy, Buddhika H.
de Zoysa, Kasun
Premarathne, H. L.
2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer), 2015, : 276 - 276
[40] On calculi for context-aware coordination
Braione, P
Picco, GP
COORDINATION MODELS AND LANGUAGES, PROCEEDINGS, 2004, 2949 : 38 - 54

← 1 2 3 4 5 →