Multimodal deep representation learning for video classification

被引:43
|
作者
Tian, Haiman [1 ]
Tao, Yudong [2 ]
Pouyanfar, Samira [1 ]
Chen, Shu-Ching [1 ]
Shyu, Mei-Ling [2 ]
机构
[1] Florida Int Univ, Sch Comp & Informat Sci, Miami, FL 33199 USA
[2] Univ Miami, Dept Elect & Comp Engn, Coral Gables, FL 33124 USA
关键词
Multimodal deep learning; Transfer learning; Multi-stage fusion; Disaster management system; EVENT DETECTION;
D O I
10.1007/s11280-018-0548-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world applications usually encounter data with various modalities, each containing valuable information. To enhance these applications, it is essential to effectively analyze all information extracted from different data modalities, while most existing learning models ignore some data types and only focus on a single modality. This paper presents a new multimodal deep learning framework for event detection from videos by leveraging recent advances in deep neural networks. First, several deep learning models are utilized to extract useful information from multiple modalities. Among these are pre-trained Convolutional Neural Networks (CNNs) for visual and audio feature extraction and a word embedding model for textual analysis. Then, a novel fusion technique is proposed that integrates different data representations in two levels, namely frame-level and video-level. Different from the existing multimodal learning algorithms, the proposed framework can reason about a missing data type using other available data modalities. The proposed framework is applied to a new video dataset containing natural disaster classes. The experimental results illustrate the effectiveness of the proposed framework compared to some single modal deep learning models as well as conventional fusion techniques. Specifically, the final accuracy is improved more than 16% and 7% compared to the best results from single modality and fusion models, respectively.
引用
收藏
页码:1325 / 1341
页数:17
相关论文
共 50 条
  • [1] Multimodal deep representation learning for video classification
    Haiman Tian
    Yudong Tao
    Samira Pouyanfar
    Shu-Ching Chen
    Mei-Ling Shyu
    [J]. World Wide Web, 2019, 22 : 1325 - 1341
  • [2] Deep Multimodal Learning: An Effective Method for Video Classification
    Zhao, Tianqi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (IEEE ICWS 2019), 2019, : 398 - 402
  • [3] Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification
    Jiang, Yu-Gang
    Wu, Zuxuan
    Tang, Jinhui
    Li, Zechao
    Xue, Xiangyang
    Chang, Shih-Fu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (11) : 3137 - 3147
  • [4] VideoToVecs: a new video representation based on deep learning techniques for video classification and clustering
    Zein Al Abidin Ibrahim
    Marwa Saab
    Ihab Sbeity
    [J]. SN Applied Sciences, 2019, 1
  • [5] VideoToVecs: a new video representation based on deep learning techniques for video classification and clustering
    Ibrahim, Zein Al Abidin
    Saab, Marwa
    Sbeity, Ihab
    [J]. SN APPLIED SCIENCES, 2019, 1 (06):
  • [6] Deep Multimodal Representation Learning: A Survey
    Guo, Wenzhong
    Wang, Jianwen
    Wang, Shiping
    [J]. IEEE ACCESS, 2019, 7 : 63373 - 63394
  • [7] Multimodal deep representation learning for protein interaction identification and protein family classification
    Zhang, Da
    Kabuka, Mansur
    [J]. BMC BIOINFORMATICS, 2019, 20 (Suppl 16)
  • [8] Multimodal deep representation learning for protein interaction identification and protein family classification
    Da Zhang
    Mansur Kabuka
    [J]. BMC Bioinformatics, 20
  • [9] Multimodal Attentive Representation Learning for Micro-video Multi-label Classification
    Jing, Peiguang
    Liu, Xianyi
    Zhang, Lijuan
    Li, Yun
    Liu, Yu
    Su, Yuting
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)
  • [10] Deep video representation learning: a survey
    Ravanbakhsh, Elham
    Liang, Yongqing
    Ramanujam, J.
    Li, Xin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (20) : 59195 - 59225