Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning

被引:0
|
作者
Zhong, Xian [1 ]
Li, Zipeng [1 ]
Chen, Shuqin [2 ]
Jiang, Kui [3 ]
Chen, Chen [4 ]
Ye, Mang [3 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
[2] Hubei Univ Educ, Coll Comp, Wuhan, Peoples R China
[3] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[4] Univ Cent Florida, Ctr Res Comp Vis, Orlando, FL USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning aims to generate natural language sentences that describe the given video accurately. Existing methods obtain favorable generation by exploring richer visual representations in encode phase or improving the decoding ability. However, the long-tailed problem hinders these attempts at low-frequency tokens, which rarely occur but carry critical semantics, playing a vital role in the detailed generation. In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens. Concretely, a Frequency-Aware Diffusion (FAD) module is proposed to comprehend the semantics of low-frequency tokens to break through generation limitations. In this way, the caption is refined by promoting the absorption of tokens with insufficient occurrence. Based on FAD, we design a Divergent Semantic Supervisor (DSS) module to compensate for the information loss of high-frequency tokens brought by the diffusion process, where the semantics of low-frequency tokens is further emphasized to alleviate the long-tailed problem. Extensive experiments indicate that RSFD outperforms the state-of-the-art methods on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate that the enhancement of low-frequency tokens semantics can obtain a competitive generation effect. Code is available at https://github.com/lzp870/RSFD.
引用
收藏
页码:3724 / 3732
页数:9
相关论文
共 50 条
  • [1] Global semantic enhancement network for video captioning
    Luo, Xuemei
    Luo, Xiaotong
    Wang, Di
    Liu, Jinhui
    Wan, Bo
    Zhao, Lin
    PATTERN RECOGNITION, 2024, 145
  • [2] Video Captioning with Semantic Guiding
    Yuan, Jin
    Tian, Chunna
    Zhang, Xiangnan
    Ding, Yuxuan
    Wei, Wei
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [3] Semantic Grouping Network for Video Captioning
    Ryu, Hobin
    Kang, Sunghun
    Kang, Haeyong
    Yoo, Chang D.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2514 - 2522
  • [4] Video Captioning with Transferred Semantic Attributes
    Pan, Yingwei
    Yao, Ting
    Li, Houqiang
    Mei, Tao
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 984 - 992
  • [5] Semantic guidance network for video captioning
    Lan Guo
    Hong Zhao
    ZhiWen Chen
    ZeYu Han
    Scientific Reports, 13
  • [6] Semantic guidance network for video captioning
    Guo, Lan
    Zhao, Hong
    Chen, Zhiwen
    Han, Zeyu
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [7] Video Captioning with Visual and Semantic Features
    Lee, Sujin
    Kim, Incheol
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (06): : 1318 - 1330
  • [8] Adaptive semantic guidance network for video captioning☆
    Liu, Yuanyuan
    Zhu, Hong
    Wu, Zhong
    Du, Sen
    Wu, Shuning
    Shi, Jing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [9] Semantic similarity information discrimination for video captioning
    Du, Sen
    Zhu, Hong
    Xiong, Ge
    Lin, Guangfeng
    Wang, Dong
    Shi, Jing
    Wang, Jing
    Xing, Nan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [10] Chained semantic generation network for video captioning
    Mao L.
    Gao H.
    Yang D.
    Zhang R.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2022, 30 (24): : 3198 - 3209