Textual Context-Aware Dense Captioning With Diverse Words

被引:21
|
作者
Shao, Zhuang [1 ]
Han, Jungong [2 ]
Debattista, Kurt [1 ]
Pang, Yanwei [3 ,4 ]
机构
[1] Univ Warwick, Warwick Mfg Grp, Coventry CV4 7AL, England
[2] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, England
[3] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[4] Shanghai Artificial Intelligence Lab, Shanghai 200032, Peoples R China
关键词
Dense Captioning; Enhanced Transformer Dense Captioner; Textual Context Module; Dynamic Vocabulary Frequency Histogram; NETWORKS;
D O I
10.1109/TMM.2023.3241517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider visual contextual clues during captioning but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity of vocabulary learned from the dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we propose an end-to-end enhanced dense captioning architecture, namely Enhanced Transformer Dense Captioner (ETDC), which obtains textual context from surrounding regions and dynamically diversifies the vocabulary bank during captioning. Concretely, we first propose the Textual Context Module (TCM), which is integrated into each self-attention layer of the Transformer decoder, to capture the surrounding textual context. Moreover, we take full advantage of the class information of object context and propose a Dynamic Vocabulary Frequency Histogram (DVFH) re-sampling strategy during training to balance words with different frequencies. The proposed method is tested on the standard dense captioning datasets and surpasses the state-of-the-art methods in terms of mean Average Precision (mAP).
引用
收藏
页码:8753 / 8766
页数:14
相关论文
共 50 条
  • [1] Hierarchical Context-aware Network for Dense Video Event Captioning
    Ji, Lei
    Guo, Xianglin
    Huang, Haoyang
    Chen, Xilin
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
  • [2] Context-aware transformer for image captioning
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    Huang, Tingting
    NEUROCOMPUTING, 2023, 549
  • [3] Image Captioning with Context-Aware Auxiliary Guidance
    Song, Zeliang
    Zhou, Xiaofei
    Mao, Zhendong
    Tan, Jianlong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2584 - 2592
  • [4] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
    Chen, Dave Zhenyu
    Gholami, Ali
    Niesner, Matthias
    Chang, Angel X.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3192 - 3202
  • [5] Learning visual relationship and context-aware attention for image captioning
    Wang, Junbo
    Wang, Wei
    Wang, Liang
    Wang, Zhiyong
    Feng, David Dagan
    Tan, Tieniu
    PATTERN RECOGNITION, 2020, 98
  • [6] Stacked Multimodal Attention Network for Context-Aware Video Captioning
    Zheng, Yi
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    Fan, Weiguo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 31 - 42
  • [7] Context-aware automated quality assessment of textual data
    Mylavarapu G.
    Viswanathan K.A.
    Thomas J.
    International Journal of Business Intelligence and Data Mining, 2023, 22 (04) : 451 - 469
  • [8] Memory-attended semantic context-aware network for video captioning
    Chen, Shuqin
    Zhong, Xian
    Wu, Shifeng
    Sun, Zhixin
    Liu, Wenxuan
    Jia, Xuemei
    Xia, Hongxia
    Soft Computing, 2021,
  • [9] Memory-attended semantic context-aware network for video captioning
    Chen, Shuqin
    Zhong, Xian
    Wu, Shifeng
    Sun, Zhixin
    Liu, Wenxuan
    Jia, Xuemei
    Xia, Hongxia
    SOFT COMPUTING, 2021, 28 (Suppl 2) : 425 - 425
  • [10] Dual dense context-aware network for hippocampal segmentation
    Shi, Jiali
    Zhang, Rong
    Guo, Lijun
    Gao, Linlin
    Li, Yuqi
    Ma, Huifang
    Wang, Jianhua
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 61