Textual Context-Aware Dense Captioning With Diverse Words

被引:21
|
作者
Shao, Zhuang [1 ]
Han, Jungong [2 ]
Debattista, Kurt [1 ]
Pang, Yanwei [3 ,4 ]
机构
[1] Univ Warwick, Warwick Mfg Grp, Coventry CV4 7AL, England
[2] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, England
[3] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[4] Shanghai Artificial Intelligence Lab, Shanghai 200032, Peoples R China
关键词
Dense Captioning; Enhanced Transformer Dense Captioner; Textual Context Module; Dynamic Vocabulary Frequency Histogram; NETWORKS;
D O I
10.1109/TMM.2023.3241517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider visual contextual clues during captioning but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity of vocabulary learned from the dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we propose an end-to-end enhanced dense captioning architecture, namely Enhanced Transformer Dense Captioner (ETDC), which obtains textual context from surrounding regions and dynamically diversifies the vocabulary bank during captioning. Concretely, we first propose the Textual Context Module (TCM), which is integrated into each self-attention layer of the Transformer decoder, to capture the surrounding textual context. Moreover, we take full advantage of the class information of object context and propose a Dynamic Vocabulary Frequency Histogram (DVFH) re-sampling strategy during training to balance words with different frequencies. The proposed method is tested on the standard dense captioning datasets and surpasses the state-of-the-art methods in terms of mean Average Precision (mAP).
引用
收藏
页码:8753 / 8766
页数:14
相关论文
共 50 条
  • [21] Context and Attribute Grounded Dense Captioning
    Yin, Guojun
    Sheng, Lu
    Liu, Bin
    Yu, Nenghai
    Wang, Xiaogang
    Shao, Jing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6234 - 6243
  • [22] Context-aware communication
    Schilit, BN
    Hilbert, DM
    Trevor, J
    IEEE WIRELESS COMMUNICATIONS, 2002, 9 (05): : 46 - 54
  • [23] Context-aware sensors
    Elnahrawy, E
    Nath, B
    WIRELESS SENSOR NETWORKS, PROCEEDINGS, 2004, 2920 : 77 - 93
  • [24] Context-aware textures
    Lu, Jianye
    Georghiades, Athinodoros S.
    Glaser, Andreas
    Wu, Hongzhi
    Wei, Li-Yi
    Guo, Baining
    Dorsey, Julie
    Rushmeier, Holly
    ACM TRANSACTIONS ON GRAPHICS, 2007, 26 (01):
  • [25] Context-aware trails
    Clarke, S
    Driver, C
    COMPUTER, 2004, 37 (08) : 97 - 99
  • [26] Context-aware communication
    Ranganathan, A
    Lei, H
    COMPUTER, 2003, 36 (04) : 90 - 92
  • [27] Context-Aware Collector
    Maciel, Carlos A. V., Jr.
    Filho, Jose Anderson S. N.
    Barros, Gabriella A. B.
    Chiu, Thun Pin T. F.
    Tedesco, Patrcia C. A. R.
    da Silva, Fabio Q. B.
    Santos, Andre L. M.
    Cavalcanti, Antonio L. O., Jr.
    Mascaro, Angelica A.
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 2181 - 2186
  • [28] Context-aware clustering
    Yuan, Junsong
    Wu, Ying
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 55 - 62
  • [29] Context-Aware IPTV
    Song, Songbo
    Moustafa, Hassnaa
    Afifi, Hossam
    WIRED-WIRELESS MULTIMEDIA NETWORKS AND SERVICES MANAGEMENT, 2009, 5842 : 189 - +
  • [30] Context-aware aspects
    Tanter, Eric
    Gybels, Kris
    Denker, Marcus
    Bergel, Alexandre
    SOFTWARE COMPOSITION, 2006, 4089 : 227 - 242