Joint modality fusion and temporal context exploitation for semantic video analysis

被引:0
|
作者
Georgios Th Papadopoulos
Vasileios Mezaris
Ioannis Kompatsiaris
Michael G. Strintzis
机构
[1] CERTH/Informatics and Telematics Institute,
[2] Electrical and Computer Engineering Department of Aristotle University of Thessaloniki,undefined
关键词
Video analysis; multi-modal analysis; temporal context; motion energy; Hidden Markov Models; Bayesian Network;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a multi-modal context-aware approach to semantic video analysis is presented. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for each modality. Subsequently, a graphical modeling-based approach is proposed for jointly performing modality fusion and temporal context exploitation. Novelties of this work include the combined use of contextual information and multi-modal fusion, and the development of a new representation for providing motion distribution information to HMMs. Specifically, an integrated Bayesian Network is introduced for simultaneously performing information fusion of the individual modality analysis results and exploitation of temporal context, contrary to the usual practice of performing each task separately. Contextual information is in the form of temporal relations among the supported classes. Additionally, a new computationally efficient method for providing motion energy distribution-related information to HMMs, which supports the incorporation of motion characteristics from previous frames to the currently examined one, is presented. The final outcome of this overall video analysis framework is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach to four datasets belonging to the domains of tennis, news and volleyball broadcast video are presented.
引用
收藏
相关论文
共 50 条
  • [21] Modality Translation and Fusion for event-based semantic segmentation
    Xia, Ruihao
    Zhao, Chaoqiang
    Sun, Qiyu
    Cao, Shuang
    Tang, Yang
    CONTROL ENGINEERING PRACTICE, 2023, 136
  • [22] No exploitation of temporal sequence context during visual search
    Bouwkamp, Floortje G.
    de Lange, Floris P.
    Spaak, Eelke
    ROYAL SOCIETY OPEN SCIENCE, 2021, 8 (03):
  • [23] Performance analysis of multiple classifier fusion for semantic video content indexing and retrieval
    Benmokhtar, Rachid
    Huet, Benoit
    ADVANCES IN MULTIMEDIA MODELING, PT 1, 2007, 4351 : 517 - 526
  • [24] TEMPORAL MEMORY ATTENTION FOR VIDEO SEMANTIC SEGMENTATION
    Wang, Hao
    Wang, Weining
    Liu, Jing
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2254 - 2258
  • [25] Temporal information integration for video semantic segmentation
    Guarino, G.
    Chateau, T.
    Teuliere, C.
    Antoine, V
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 8545 - 8551
  • [26] Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion
    Zhang, Beibei
    Yu, Fan
    Gao, Yanxin
    Ren, Tongwei
    Wu, Gangshan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4848 - 4852
  • [27] Rethinking Temporal Fusion for Video-Based Person Re-Identification on Semantic and Time Aspect
    Jiang, Xinyang
    Gong, Yifei
    Guo, Xiaowei
    Yang, Qize
    Huang, Feiyue
    Zheng, Weishi
    Zheng, Feng
    Sun, Xing
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11133 - 11140
  • [28] Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation
    Park, Kwanyong
    Woo, Sanghyun
    Kim, Dahun
    Cho, Donghyeon
    Kweon, In So
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1248 - 1257
  • [29] Cross-Domain Modality Fusion for Dense Video Captioning
    Aafaq N.
    Mian A.
    Liu W.
    Akhtar N.
    Shah M.
    IEEE Transactions on Artificial Intelligence, 2022, 3 (05): : 763 - 777
  • [30] CORRELATION-BASED FEATURE ANALYSIS AND MULTI-MODALITY FUSION FRAMEWORK FOR MULTIMEDIA SEMANTIC RETRIEVAL
    Ha, Hsin-Yu
    Yang, Yimin
    Fleites, Fausto C.
    Chen, Shu-Ching
    2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,