Multimodal Information Fusion for Semantic Video Analysis

被引:1
|
作者
Gulen, Elvan [1 ]
Yilmaz, Turgay [1 ,2 ]
Yazici, Adnan [1 ]
机构
[1] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkey
[2] Univ Tokyo, Inst Ind Sci, Tokyo, Japan
关键词
Concept Interactions; Multimedia Content Analysis; Multimedia Information; Multimodal Fusion; Semantic Concept Detection;
D O I
10.4018/jmdem.2012100103
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Multimedia data by its very nature contains multimodal information in it. For a successful analysis of multimedia content, all available multimodal information should be utilized. Additionally, since concepts can contain valuable cues about other concepts, concept interaction is a crucial source of multimedia information and helps to increase the fusion performance. The aim of this study is to show that integrating existing modalities along with the concept interactions can yield a better performance in detecting semantic concepts. Therefore, in this paper, the authors present a multimodal fusion approach that integrates semantic information obtained from various modalities along with additional semantic cues. The experiments conducted on TRECVID 2007 and CCV Database datasets validates the superiority of such combination over best single modality and alternative modality combinations. The results show that the proposed fusion approach provides 16.7% relative performance gain on TRECVID dataset and 47.7% relative performance improvement on CCV database over the results of best unimodal approaches.
引用
收藏
页码:52 / 74
页数:23
相关论文
共 50 条
  • [41] A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking
    Vukotic, Vedran
    Raymond, Christian
    Gravier, Guillaume
    IEEE MULTIMEDIA, 2018, 25 (02) : 11 - 23
  • [42] Deep learning-based late fusion of multimodal information for emotion classification of music video
    Pandeya, Yagya Raj
    Lee, Joonwhoan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (02) : 2887 - 2905
  • [43] Linear Multimodal Fusion in Video Concept Analysis Based on Node Equilibrium Model
    Geng, Jie
    Miao, Zhenjiang
    Liang, Qinghua
    Wang, Shu
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 316 - 320
  • [44] Deep learning-based late fusion of multimodal information for emotion classification of music video
    Yagya Raj Pandeya
    Joonwhoan Lee
    Multimedia Tools and Applications, 2021, 80 : 2887 - 2905
  • [45] Affective video content analysis based on multimodal data fusion in heterogeneous networks
    Guo, Jie
    Song, Bin
    Zhang, Peng
    Ma, Mengdi
    Luo, Wenwen
    Junmei
    INFORMATION FUSION, 2019, 51 : 224 - 232
  • [46] Socializing Multimodal Sensors for Information Fusion
    Wang, Yuhui
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 653 - 656
  • [47] STSI: Efficiently Mine Spatio-Temporal Semantic Information between Different Multimodal for Video Captioning
    Xiong, Huiyu
    Wang, Lanxiao
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [48] VMSG: a video caption network based on multimodal semantic grouping and semantic attention
    Xin Yang
    Xiangchen Wang
    Xiaohui Ye
    Tao Li
    Multimedia Systems, 2023, 29 : 2575 - 2589
  • [49] VMSG: a video caption network based on multimodal semantic grouping and semantic attention
    Yang, Xin
    Wang, Xiangchen
    Ye, Xiaohui
    Li, Tao
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2575 - 2589
  • [50] Deep multimodal fusion for semantic image segmentation: A survey
    Zhang, Yifei
    Sidibe, Desire
    Morel, Olivier
    Meriaudeau, Fabrice
    IMAGE AND VISION COMPUTING, 2021, 105