Multimodal-enhanced hierarchical attention network for video captioning

被引：0

作者：

Maosheng Zhong

Youde Chen

Hao Zhang

Hao Xiong

Zhixiang Wang

机构：

[1] Jiangxi Normal University,

来源：

Multimedia Systems | 2023年 / 29卷

关键词：

Video captioning; Bidirectional decoding transformer; Multimodal enhancement; Hierarchical attention network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In video captioning, many pioneering approaches have been developed to generate higher-quality captions by exploring and adding new video feature modalities. However, as the number of modalities increases, the negative interaction between them gradually reduces the gain of caption generation. To address this problem, we propose a three-layer hierarchical attention network based on a bidirectional decoding transformer that enhances multimodal features. In the first layer, we execute different encoders according to the characteristics of each modality to enhance the vector representation of each modality. Then, in the second layer, we select keyframes from all sampled frames of the modality by calculating the attention value between the generated words and each frame of the modality. Finally, in the third layer, we allocate weights to different modalities to reduce redundancy between them before generating the current word. Additionally, we use a bidirectional decoder to consider the context of the ground-truth caption when generating captions. Experiments on two mainstream benchmark datasets, MSVD and MSR-VTT, demonstrate the effectiveness of our proposed model. The model achieves state-of-the-art performance in significant metrics, and the generated sentences are more in line with human language habits. Overall, our three-layer hierarchical attention network based on a bidirectional decoding transformer effectively enhances multimodal features and generates high-quality video captions. Codes are available on https://github.com/nickchen121/MHAN.

引用

页码：2469 / 2482

页数：13

共 50 条

[41] Dense Video Captioning with Hierarchical Attention-Based Encoder-Decoder Networks
Yu, Mingjing
Zheng, Huicheng
Liu, Zehua
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[42] Video Captioning with Guidance of Multimodal Latent Topics
Chen, Shizhe
Chen, Jia
Jin, Qin
Hauptmann, Alexander
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1838 - 1846
[43] Attend to Knowledge: Memory-Enhanced Attention Network for Image Captioning
Chen, Hui
Ding, Guiguang
Lin, Zijia
Guo, Yuchen
Han, Jungong
ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2018, 2018, 10989 : 161 - 171
[44] Critic-based Attention Network for Event-based Video Captioning
Barati, Elaheh
Chen, Xuewen
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 811 - 817
[45] Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning
Gui, Yuling
Guo, Dan
Zhao, Ye
PROCEEDINGS OF THE 2ND WORKSHOP ON MULTIMEDIA FOR ACCESSIBLE HUMAN COMPUTER INTERFACES (MAHCI '19), 2019, : 25 - 32
[46] Leveraging Weighted Fine-Grained Cross-Graph Attention for Visual and Semantic Enhanced Video Captioning Network
Verma, Deepali
Haldar, Arya
Dutta, Tanima
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2465 - 2473
[47] Reconstruction Network for Video Captioning
Wang, Bairui
Ma, Lin
Zhang, Wei
Liu, Wei
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7622 - 7631
[48] Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Gao, Lianli
Li, Xiangpeng
Song, Jingkuan
Shen, Heng Tao
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1112 - 1131
[49] Video Captioning via Hierarchical Reinforcement Learning
Wang, Xin
Chen, Wenhu
Wu, Jiawei
Wang, Yuan-Fang
Wang, William Yang
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4213 - 4222
[50] RESTHT: relation-enhanced spatial-temporal hierarchical transformer for video captioning
Zheng, Lihuan
Xu, Wanru
Miao, Zhenjiang
Qiu, Xinxiu
Gong, Shanshan
VISUAL COMPUTER, 2025, 41 (01): : 591 - 604

← 1 2 3 4 5 →