Textual Context-Aware Dense Captioning With Diverse Words

被引:21
|
作者
Shao, Zhuang [1 ]
Han, Jungong [2 ]
Debattista, Kurt [1 ]
Pang, Yanwei [3 ,4 ]
机构
[1] Univ Warwick, Warwick Mfg Grp, Coventry CV4 7AL, England
[2] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, England
[3] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[4] Shanghai Artificial Intelligence Lab, Shanghai 200032, Peoples R China
关键词
Dense Captioning; Enhanced Transformer Dense Captioner; Textual Context Module; Dynamic Vocabulary Frequency Histogram; NETWORKS;
D O I
10.1109/TMM.2023.3241517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider visual contextual clues during captioning but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity of vocabulary learned from the dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we propose an end-to-end enhanced dense captioning architecture, namely Enhanced Transformer Dense Captioner (ETDC), which obtains textual context from surrounding regions and dynamically diversifies the vocabulary bank during captioning. Concretely, we first propose the Textual Context Module (TCM), which is integrated into each self-attention layer of the Transformer decoder, to capture the surrounding textual context. Moreover, we take full advantage of the class information of object context and propose a Dynamic Vocabulary Frequency Histogram (DVFH) re-sampling strategy during training to balance words with different frequencies. The proposed method is tested on the standard dense captioning datasets and surpasses the state-of-the-art methods in terms of mean Average Precision (mAP).
引用
收藏
页码:8753 / 8766
页数:14
相关论文
共 50 条
  • [31] The Context-Aware Browser
    Coppola, Paolo
    Della Mea, Vincenzo
    Di Gaspero, Luca
    Menegon, Davide
    Mischis, Danny
    Mizzaro, Stefano
    Scagnetto, Ivan
    Vassena, Luca
    IEEE INTELLIGENT SYSTEMS, 2010, 25 (01) : 38 - 47
  • [32] Architecture, Textual Context Description, and Quiz Generation Scheme for the Movie Based Context-Aware Learning System
    Hazriani
    Nakanishi, Tsuneo
    Fukuda, Akira
    PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 2410 - 2413
  • [33] Towards context-aware collaborative filtering by learning context-aware latent representations
    Liu, Xin
    Zhang, Jiyong
    Yan, Chenggang
    KNOWLEDGE-BASED SYSTEMS, 2020, 199
  • [34] Context-aware regulation of context-aware mobile services in pervasive computing environments
    Syukur, Evi
    Loke, Seng Wai
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 4, 2006, 3983 : 138 - 147
  • [35] Context-Aware Architecture utilizing Computing with Words and ISO/IEC/IEEE 42010
    Das, Asesh
    SOUTHEASTCON 2016, 2016,
  • [36] Context Variability for Context-Aware Systems
    Capilla, Rafael
    Ortiz, Oscar
    Hinchey, Mike
    COMPUTER, 2014, 47 (02) : 85 - 87
  • [37] Context-aware Media Player (CaMP): Developing context-aware applications with Separation of Concerns
    Paspallis, Nearchos
    Achilleos, Achilleas
    Kakousis, Konstantinos
    Papadopoulos, George A.
    2010 IEEE GLOBECOM WORKSHOPS, 2010, : 1684 - 1689
  • [38] Dense Captioning with Joint Inference and Visual Context
    Yang, Linjie
    Tang, Kevin
    Yang, Jianchao
    Li, Li-Jia
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1978 - 1987
  • [39] Context-Aware Sentiment Classification
    Kasthuriarachchy, Buddhika H.
    de Zoysa, Kasun
    Premarathne, H. L.
    2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer), 2015, : 276 - 276
  • [40] On calculi for context-aware coordination
    Braione, P
    Picco, GP
    COORDINATION MODELS AND LANGUAGES, PROCEEDINGS, 2004, 2949 : 38 - 54