DEEP LEARNING FOR MULTIMODAL-BASED VIDEO INTERESTINGNESS PREDICTION

被引:0
|
作者
Shen, Yuesong [1 ]
Demarty, Claire-Helene [2 ]
Duong, Ngoc Q. K. [2 ]
机构
[1] Tech Univ Munich, Munich, Germany
[2] Tech, Rennes, France
关键词
Video interestingness prediction; social interestingness; content interestingness; multimodal fusion; deep neural network (DNN); MediaEval; 2016;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Predicting interestingness of media content remains an important, but challenging research subject. The difficulty comes first from the fact that, besides being a high-level semantic concept, interestingness is highly subjective and its global definition has not been agreed yet. This paper presents the use of up-to-date deep learning techniques for solving the task. We perform experiments with both social-driven (i.e., Flickr videos) and content-driven (i.e., videos from the MediaEval 2016 interestingness task) datasets. To account for the temporal aspect and multimodality of videos, we tested various deep neural network (DNN) architectures, including a new combination of several recurrent neural networks (RNNs), to handle several temporal samples at the same time. We then investigated different strategies for dealing with unbalanced datasets. Multimodality, as the mid-level fusion of audio and visual information, brought benefit to the task. We also established that social interestingness differs from content interestingness.
引用
收藏
页码:1003 / 1008
页数:6
相关论文
共 50 条
  • [31] Deep learning based multimodal urban air quality prediction and traffic analytics
    Hameed, Saad
    Islam, Ashadul
    Ahmad, Kashif
    Belhaouari, Samir Brahim
    Qadir, Junaid
    Al-Fuqaha, Ala
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01):
  • [32] An efficient cybersecurity framework for facial video forensics detection based on multimodal deep learning
    Sedik, Ahmed
    Faragallah, Osama S.
    El-sayed, Hala S.
    El-Banby, Ghada M.
    Abd El-Samie, Fathi E.
    Khalaf, Ashraf A. M.
    El-Shafai, Walid
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (02): : 1251 - 1268
  • [33] An efficient cybersecurity framework for facial video forensics detection based on multimodal deep learning
    Ahmed Sedik
    Osama S. Faragallah
    Hala S. El-sayed
    Ghada M. El-Banby
    Fathi E. Abd El-Samie
    Ashraf A. M. Khalaf
    Walid El-Shafai
    [J]. Neural Computing and Applications, 2022, 34 : 1251 - 1268
  • [34] TransGait: Multimodal-based gait recognition with set transformer
    Guodong Li
    Lijun Guo
    Rong Zhang
    Jiangbo Qian
    Shangce Gao
    [J]. Applied Intelligence, 2023, 53 : 1535 - 1547
  • [35] Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning
    Sun, Bo
    Xu, Qihua
    He, Jun
    Yu, Lejun
    Li, Liandong
    Wei, Qinglan
    [J]. PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 621 - 631
  • [36] An efficient cybersecurity framework for facial video forensics detection based on multimodal deep learning
    Sedik, Ahmed
    Faragallah, Osama S.
    El-sayed, Hala S.
    El-Banby, Ghada M.
    El-Samie, Fathi E. Abd
    Khalaf, Ashraf A. M.
    El-Shafai, Walid
    [J]. Neural Computing and Applications, 2022, 34 (02) : 1251 - 1268
  • [37] On the consensus of synchronous temporal and spatial views: A novel multimodal deep learning method for social video prediction
    Xiao, Shuaiyong
    Wang, Jianxiong
    Wang, Jiwei
    Chen, Runlin
    Chen, Gang
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [38] Video Frame Prediction via Deep Learning
    Yilmaz, M. Akin
    Tekalp, A. Murat
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [39] A Review on Deep Learning Techniques for Video Prediction
    Oprea, Sergiu
    Martinez-Gonzalez, Pablo
    Garcia-Garcia, Alberto
    Castro-Vargas, John Alejandro
    Orts-Escolano, Sergio
    Garcia-Rodriguez, Jose
    Argyros, Antonis
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 2806 - 2826
  • [40] Crop yield prediction utilizing multimodal deep learning
    Jacome-Galarza, Luis-Roberto
    [J]. PROCEEDINGS OF 2021 16TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2021), 2021,