Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos

被引:0
|
作者
Liu, Nayu [1 ,2 ]
Sun, Xian [1 ,2 ]
Yul, Hongfeng [1 ]
Zhangi, Wenkai [1 ]
Xui, Guangluan [1 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Key Lab Network Informat Syst Technol, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal summarization for open-domain videos is an emerging task, aiming to generate a summary from multisource information (video, audio, transcript). Despite the success of recent multiencoder-decoder frameworks on this task, existing methods lack finegrained multimodality interactions of multisource inputs. Besides, unlike other multimodal tasks, this task has longer multimodal sequences with more redundancy and noise. To address these two issues, we propose a multistage fusion network with the fusion forget gate module, which builds upon this approach by modeling fine-grained interactions between the multisource modalities through a multistep fusion schema and controlling the flow of redundant information between multimodal long sequences via a forgetting module. Experimental results on the How2 dataset show that our proposed model achieves a new state-of-the-art performance. Comprehensive analysis empirically verifies the effectiveness of our fusion schema and forgetting module on multiple encoder-decoder architectures. Specially, when using high noise ASR transcripts (WER>30%), our model still achieves performance close to the ground-truth transcript model, which reduces manual annotation cost.
引用
下载
收藏
页码:1834 / 1845
页数:12
相关论文
共 17 条
  • [1] Abstractive Summarization for Video: A Revisit in Multistage Fusion Network With Forget Gate
    Liu, Nayu
    Sun, Xian
    Yu, Hongfeng
    Yao, Fanglong
    Xu, Guangluan
    Fu, Kun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3296 - 3310
  • [2] Description generation of open-domain videos incorporating multimodal features and bidirectional encoder
    Xiaotong Du
    Jiabin Yuan
    Liu Hu
    Yuke Dai
    The Visual Computer, 2019, 35 : 1703 - 1712
  • [3] Description generation of open-domain videos incorporating multimodal features and bidirectional encoder
    Du, Xiaotong
    Yuan, Jiabin
    Hu, Liu
    Dai, Yuke
    VISUAL COMPUTER, 2019, 35 (12): : 1703 - 1712
  • [4] Supervised ranking in open-domain text summarization
    Nomoto, T
    Matsumoto, Y
    40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 465 - 472
  • [5] Open-Domain Trending Hashtag Recommendation for Videos
    Mehta, Swapneel
    Sarkhel, Somdeb
    Chen, Xiang
    Mitra, Saayan
    Swaminathan, Viswanathan
    Rossi, Ryan
    Aminian, Ali
    Guo, Han
    Garg, Kshitiz
    23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 174 - 181
  • [6] Adapting Generative Pre-trained Language Model for Open-domain Multimodal Sentence Summarization
    Lin, Dengtian
    Jing, Liqiang
    Song, Xuemeng
    Liu, Meng
    Sun, Teng
    Nie, Liqiang
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 195 - 204
  • [7] LEARNING FROM MULTIVIEW CORRELATIONS IN OPEN-DOMAIN VIDEOS
    Holzenberger, Nils
    Palaskar, Shruti
    Madhyastha, Pranava
    Metze, Florian
    Arora, Raman
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8628 - 8632
  • [8] The diversity-based approach to open-domain text summarization
    Nomoto, T
    Matsumoto, Y
    INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (03) : 363 - 389
  • [9] Automatic summarization of open-domain multiparty dialogues in diverse genres
    Zechner, K
    COMPUTATIONAL LINGUISTICS, 2002, 28 (04) : 447 - 485
  • [10] MODE: a multimodal open-domain dialogue dataset with explanation
    Yin, Hang
    Lu, Pinren
    Li, Ziang
    Sun, Bin
    Li, Kan
    APPLIED INTELLIGENCE, 2024, 54 (07) : 5891 - 5906