VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles

被引:0
|
作者
Li, Mingzhe [1 ,2 ]
Chen, Xiuying [1 ,2 ]
Gao, Shen [2 ]
Chan, Zhangming [1 ,2 ]
Zhao, Dongyan [1 ,2 ]
Yan, Rui [1 ,2 ]
机构
[1] Peking Univ, Ctr Data Sci, AAIS, Beijing, Peoples R China
[2] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A popular multimedia news format nowadays is providing users with a lively video and a corresponding news article, which is employed by influential news media including CNN, BBC, and social media including Twitter and Weibo. In such a case, automatically choosing a proper cover frame of the video and generating an appropriate textual summary of the article can help editors save time, and readers make the decision more effectively. Hence, in this paper, we propose the task of Video-based Multimodal Summarization with Multimodal Output (VMSMO) to tackle such a problem. The main challenge in this task is to jointly model the temporal dependency of video with semantic meaning of article. To this end, we propose a Dual-Interaction-based Multimodal Summarizer (DIMS), consisting of a dual interaction module and multimodal generator. In the dual interaction module, we propose a conditional self-attention mechanism that captures local semantic information within video and a global-attention mechanism that handles the semantic relationship between news text and video from a high level. Extensive experiments conducted on a large-scale real-world VMSMO dataset(1) show that DIMS achieves the state-of-the-art performance in terms of both automatic metrics and human evaluations.
引用
收藏
页码:9360 / 9369
页数:10
相关论文
共 50 条
  • [1] MLASK: Multimodal Summarization of Video-based News Articles
    Krubinski, Mateusz
    Pecina, Pavel
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 910 - 924
  • [2] Video-Based Multimodal Personality Analysis
    Li, Ziqing
    Yang, Zhenyuan
    Zhang, Jiawen
    Chen, Lianggangxu
    He, Gaoqi
    [J]. THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021), 2022, 12083
  • [3] Video Killed the News Article? Comparing Multimodal Framing Effects in News Videos and Articles
    Powell, Thomas E.
    Boomgaarden, Hajo G.
    De Swert, Knut
    de Vreese, Claes H.
    [J]. JOURNAL OF BROADCASTING & ELECTRONIC MEDIA, 2018, 62 (04) : 578 - 596
  • [4] Video augmentation to support video-based learning
    Torre, Ilaria
    Galluccio, Ilenia
    Coccoli, Mauro
    [J]. PROCEEDINGS OF THE WORKING CONFERENCE ON ADVANCED VISUAL INTERFACES AVI 2022, 2022,
  • [5] Language teachers and multimodal instructional reflections during video-based online learning tasks
    Ding, Ai-Chu Elisha
    Glazewski, Krista
    Pawan, Faridah
    [J]. TECHNOLOGY PEDAGOGY AND EDUCATION, 2022, 31 (03) : 293 - 312
  • [6] News video retrieval by learning multimodal semantic information
    Yu, Hui
    Su, Bolan
    Lu, Hong
    Xue, Xiangyang
    [J]. ADVANCES IN VISUAL INFORMATION SYSTEMS, 2007, 4781 : 403 - 414
  • [7] Machine learning for video-based rendering
    Schödl, A
    Essa, I
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1002 - 1008
  • [8] Video-based personalized traffic learning
    Chao, Qianwen
    Shen, Jingjing
    Jin, Xiaogang
    [J]. GRAPHICAL MODELS, 2013, 75 : 305 - 317
  • [9] Measuring Understanding in Video-Based Learning
    Lin, Song-Yi
    Shih, Meilun
    Tsai, Hsin-Mu
    [J]. 31ST INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION, ICCE 2023, VOL II, 2023, : 240 - 248
  • [10] Researchers and teachers learning together and from each other using video-based multimodal analysis
    Davidsen, Jacob
    Vanderlinde, Ruben
    [J]. BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 2014, 45 (03) : 451 - 460