VIDEO MEMORABILITY PREDICTION VIA LATE FUSION OF DEEP MULTI-MODAL FEATURES

被引:4
|
作者
Leyva, Roberto [1 ]
Sanchez, Victor [1 ]
机构
[1] Univ Warwick, Dept Comp Sci, Coventry, W Midlands, England
关键词
Video memorability prediction; video analysis; multi-modal feature processing; fusion;
D O I
10.1109/ICIP42928.2021.9506411
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video memorability is a cornerstone in social media platform analysis, as a highly memorable video is more likely to be noticed and shared. This paper proposes a new framework to fuse multi-modal information to predict the likelihood of remembering a video. The proposed framework relies on late fusion of text, visual and motion features. Specifically, two neural networks extract features from the captions describing the video's content; two ResNet models extract visual features from specific frames, and two 3DResNet models, combined with Fisher Vectors, extract features from the video's motion information. The extracted features are used to compute several memorability scores via Bayesian Ridge regression, which are then fused based on a greedy search of the optimal fusion parameters. Experiments demonstrate the superiority of the proposed framework on the MediaEval2019 dataset, outperforming the state-of-the-art.
引用
收藏
页码:2488 / 2492
页数:5
相关论文
共 50 条
  • [1] Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction
    Li, Jing
    Guo, Xin
    Yue, Fumei
    Xue, Fanfu
    Sun, Jiande
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [2] Fusion of Multi-Modal Features to Enhance Dense Video Caption
    Huang, Xuefei
    Chan, Ka-Hou
    Wu, Weifan
    Sheng, Hao
    Ke, Wei
    SENSORS, 2023, 23 (12)
  • [3] Multi-modal fusion for video understanding
    Hoogs, A
    Mundy, J
    Cross, G
    30TH APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP, PROCEEDINGS: ANALYSIS AND UNDERSTANDING OF TIME VARYING IMAGERY, 2001, : 103 - 108
  • [4] Dynamic Deep Multi-modal Fusion for Image Privacy Prediction
    Tonge, Ashwini
    Caragea, Cornelia
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 1829 - 1840
  • [5] Deep Gated Multi-modal Fusion for Image Privacy Prediction
    Zhao, Chenye
    Caragea, Cornelia
    ACM TRANSACTIONS ON THE WEB, 2023, 17 (04)
  • [6] Multi⁃modal self⁃attention network for video memorability prediction
    Lyu W.
    Han J.-Z.
    Chu J.-H.
    Jing P.-G.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2023, 53 (04): : 1211 - 1219
  • [7] Video Visual Relation Detection via Multi-modal Feature Fusion
    Sun, Xu
    Ren, Tongwei
    Zi, Yuan
    Wu, Gangshan
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2657 - 2661
  • [8] Deep fusion of multi-modal features for brain tumor image segmentation
    Zhang, Guying
    Zhou, Jia
    He, Guanghua
    Zhu, Hancan
    HELIYON, 2023, 9 (08)
  • [9] Learning Visual Emotion Distributions via Multi-Modal Features Fusion
    Zhao, Sicheng
    Ding, Guiguang
    Gao, Yue
    Han, Jungong
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 369 - 377
  • [10] A Novel Deep Multi-Modal Feature Fusion Method for Celebrity Video Identification
    Chen, Jianrong
    Yang, Li
    Xu, Yuanyuan
    Huo, Jing
    Shi, Yinghuan
    Gao, Yang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2535 - 2538