Joint modality fusion and temporal context exploitation for semantic video analysis

被引:0
|
作者
Georgios Th Papadopoulos
Vasileios Mezaris
Ioannis Kompatsiaris
Michael G. Strintzis
机构
[1] CERTH/Informatics and Telematics Institute,
[2] Electrical and Computer Engineering Department of Aristotle University of Thessaloniki,undefined
关键词
Video analysis; multi-modal analysis; temporal context; motion energy; Hidden Markov Models; Bayesian Network;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a multi-modal context-aware approach to semantic video analysis is presented. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for each modality. Subsequently, a graphical modeling-based approach is proposed for jointly performing modality fusion and temporal context exploitation. Novelties of this work include the combined use of contextual information and multi-modal fusion, and the development of a new representation for providing motion distribution information to HMMs. Specifically, an integrated Bayesian Network is introduced for simultaneously performing information fusion of the individual modality analysis results and exploitation of temporal context, contrary to the usual practice of performing each task separately. Contextual information is in the form of temporal relations among the supported classes. Additionally, a new computationally efficient method for providing motion energy distribution-related information to HMMs, which supports the incorporation of motion characteristics from previous frames to the currently examined one, is presented. The final outcome of this overall video analysis framework is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach to four datasets belonging to the domains of tennis, news and volleyball broadcast video are presented.
引用
收藏
相关论文
共 50 条
  • [1] Joint modality fusion and temporal context exploitation for semantic video analysis
    Papadopoulos, Georgios Th
    Mezaris, Vasileios
    Kompatsiaris, Ioannis
    Strintzis, Michael G.
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
  • [2] Joint temporal context exploitation and active learning for video segmentation
    Tian, Yan
    Cheng, Guohua
    Gelernter, Judith
    Yu, Shihao
    Song, Chao
    Yang, Bailin
    PATTERN RECOGNITION, 2020, 100 (100)
  • [3] Temporal-Semantic Context Fusion for Robust Weakly Supervised Video Anomaly Detection
    Zeng, Yuan
    Wu, Yuanyuan
    Liang, Jing
    Zeng, Wu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 245 - 256
  • [4] Semantic video indexing using context-dependent fusion
    Kim, Dae-Jin
    Frigui, Hichem
    Fadeev, Aleksey
    MULTIMEDIA CONTENT ACCESS: ALGORITHMS AND SYSTEMS II, 2008, 6820
  • [5] Multimodal Information Fusion for Semantic Video Analysis
    Gulen, Elvan
    Yilmaz, Turgay
    Yazici, Adnan
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2012, 3 (04): : 52 - 74
  • [6] CSMF-SPC: Multimodal Sentiment Analysis Model with Effective Context Semantic Modality Fusion and Sentiment Polarity Correction
    Li, Yuqiang
    Weng, Wenxuan
    Liu, Chun
    Li, Lin
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (03)
  • [7] Temporal-enhanced Cross-modality Fusion Network for Video Sentence Grounding
    Lv, Zezhong
    Su, Bing
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1487 - 1492
  • [8] TLCFuse: Temporal Multi-Modality Fusion Towards Occlusion-Aware Semantic Segmentation
    Salazar-Gomez, Gustavo
    Liu, Wenqian
    Diaz-Zapata, Manuel
    Sierra-Gonzalez, David
    Laugier, Christian
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 2110 - 2116
  • [9] COMBINING MULTIMODAL AND TEMPORAL CONTEXTUAL INFORMATION FOR SEMANTIC VIDEO ANALYSIS
    Papadopoulos, Georgios Th.
    Mezaris, Vasileios
    Kompatsiaris, Ioannis
    Strintzis, Michael G.
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 4325 - +
  • [10] Modality Mixture Projections for Semantic Video Event Detection
    Shen, Jialie
    Tao, Dacheng
    Li, Xuelong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2008, 18 (11) : 1587 - 1596