Joint modality fusion and temporal context exploitation for semantic video analysis

被引：0

作者：

Georgios Th Papadopoulos

Vasileios Mezaris

Ioannis Kompatsiaris

Michael G. Strintzis

机构：

[1] CERTH/Informatics and Telematics Institute,

[2] Electrical and Computer Engineering Department of Aristotle University of Thessaloniki,undefined

来源：

EURASIP Journal on Advances in Signal Processing | / 2011卷

关键词：

Video analysis; multi-modal analysis; temporal context; motion energy; Hidden Markov Models; Bayesian Network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, a multi-modal context-aware approach to semantic video analysis is presented. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for each modality. Subsequently, a graphical modeling-based approach is proposed for jointly performing modality fusion and temporal context exploitation. Novelties of this work include the combined use of contextual information and multi-modal fusion, and the development of a new representation for providing motion distribution information to HMMs. Specifically, an integrated Bayesian Network is introduced for simultaneously performing information fusion of the individual modality analysis results and exploitation of temporal context, contrary to the usual practice of performing each task separately. Contextual information is in the form of temporal relations among the supported classes. Additionally, a new computationally efficient method for providing motion energy distribution-related information to HMMs, which supports the incorporation of motion characteristics from previous frames to the currently examined one, is presented. The final outcome of this overall video analysis framework is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach to four datasets belonging to the domains of tennis, news and volleyball broadcast video are presented.

引用

共 50 条

[31] Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition
Crispim-Junior, Carlos F.
Buso, Vincent
Avgerinakis, Konstantinos
Meditskos, Georgios
Briassouli, Alexia
Benois-Pineau, Jenny
Kompatsiaris, Ioannis
Bremond, Francois
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) : 1598 - 1611
[32] Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism
Zhao, Lianfen
Pan, Zhengjun
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 381 - 386
[33] MASNet: Road Semantic Segmentation Based on Multiscale Modality Fusion Perception
Li, Xiaohang
Zhou, Jianjiang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 13
[34] Semantic context based refinement for news video annotation
Wang, Zhiyong
Guan, Genliang
Qiu, Yu
Zhuo, Li
Feng, Dagan
MULTIMEDIA TOOLS AND APPLICATIONS, 2013, 67 (03) : 607 - 627
[35] Reducing Semantic Gap in Video Retrieval with Fusion: A survey
Sudha, D.
Priyadarshini, J.
BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 496 - 502
[36] Dual Semantic Fusion Network for Video Object Detection
Lin, Lijian
Chen, Haosheng
Zhang, Honglun
Liang, Jun
Li, Yu
Shan, Ying
Wang, Hanzi
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1855 - 1863
[37] MTSCANet: Multi temporal resolution temporal semantic context aggregation network
Zhang, Haiping
Ma, Conghao
Yu, Dongjin
Guan, Liming
Wang, Dongjing
Hu, Zepeng
Liu, Xu
IET COMPUTER VISION, 2023, 17 (03) : 366 - 378
[38] Semantic context based refinement for news video annotation
Zhiyong Wang
Genliang Guan
Yu Qiu
Li Zhuo
Dagan Feng
Multimedia Tools and Applications, 2013, 67 : 607 - 627
[39] The Analysis and Annotation of Semantic Modality for Chinese Words
Zhang, Shen
Jia, Jia
Wang, Xiaohui
Cai, Lianhong
11TH CHINESE LEXICAL SEMANTICS WORKSHOP (CKSW2010), 2010, : 143 - 150
[40] Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset
Haddad S.
Daassi O.
Belghith S.
SN Computer Science, 5 (6)

← 1 2 3 4 5 →