Joint modality fusion and temporal context exploitation for semantic video analysis

被引：0

作者：

Georgios Th Papadopoulos

Vasileios Mezaris

Ioannis Kompatsiaris

Michael G. Strintzis

机构：

[1] CERTH/Informatics and Telematics Institute,

[2] Electrical and Computer Engineering Department of Aristotle University of Thessaloniki,undefined

来源：

EURASIP Journal on Advances in Signal Processing | / 2011卷

关键词：

Video analysis; multi-modal analysis; temporal context; motion energy; Hidden Markov Models; Bayesian Network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, a multi-modal context-aware approach to semantic video analysis is presented. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for each modality. Subsequently, a graphical modeling-based approach is proposed for jointly performing modality fusion and temporal context exploitation. Novelties of this work include the combined use of contextual information and multi-modal fusion, and the development of a new representation for providing motion distribution information to HMMs. Specifically, an integrated Bayesian Network is introduced for simultaneously performing information fusion of the individual modality analysis results and exploitation of temporal context, contrary to the usual practice of performing each task separately. Contextual information is in the form of temporal relations among the supported classes. Additionally, a new computationally efficient method for providing motion energy distribution-related information to HMMs, which supports the incorporation of motion characteristics from previous frames to the currently examined one, is presented. The final outcome of this overall video analysis framework is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach to four datasets belonging to the domains of tennis, news and volleyball broadcast video are presented.

引用

共 50 条

[1] Joint modality fusion and temporal context exploitation for semantic video analysis
Papadopoulos, Georgios Th
Mezaris, Vasileios
Kompatsiaris, Ioannis
Strintzis, Michael G.
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
[2] Joint temporal context exploitation and active learning for video segmentation
Tian, Yan
Cheng, Guohua
Gelernter, Judith
Yu, Shihao
Song, Chao
Yang, Bailin
PATTERN RECOGNITION, 2020, 100 (100)
[3] Temporal-Semantic Context Fusion for Robust Weakly Supervised Video Anomaly Detection
Zeng, Yuan
Wu, Yuanyuan
Liang, Jing
Zeng, Wu
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 245 - 256
[4] Semantic video indexing using context-dependent fusion
Kim, Dae-Jin
Frigui, Hichem
Fadeev, Aleksey
MULTIMEDIA CONTENT ACCESS: ALGORITHMS AND SYSTEMS II, 2008, 6820
[5] Multimodal Information Fusion for Semantic Video Analysis
Gulen, Elvan
Yilmaz, Turgay
Yazici, Adnan
INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2012, 3 (04): : 52 - 74
[6] CSMF-SPC: Multimodal Sentiment Analysis Model with Effective Context Semantic Modality Fusion and Sentiment Polarity Correction
Li, Yuqiang
Weng, Wenxuan
Liu, Chun
Li, Lin
PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (03)
[7] Temporal-enhanced Cross-modality Fusion Network for Video Sentence Grounding
Lv, Zezhong
Su, Bing
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1487 - 1492
[8] TLCFuse: Temporal Multi-Modality Fusion Towards Occlusion-Aware Semantic Segmentation
Salazar-Gomez, Gustavo
Liu, Wenqian
Diaz-Zapata, Manuel
Sierra-Gonzalez, David
Laugier, Christian
2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 2110 - 2116
[9] COMBINING MULTIMODAL AND TEMPORAL CONTEXTUAL INFORMATION FOR SEMANTIC VIDEO ANALYSIS
Papadopoulos, Georgios Th.
Mezaris, Vasileios
Kompatsiaris, Ioannis
Strintzis, Michael G.
2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 4325 - +
[10] Modality Mixture Projections for Semantic Video Event Detection
Shen, Jialie
Tao, Dacheng
Li, Xuelong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2008, 18 (11) : 1587 - 1596

← 1 2 3 4 5 →