Joint modality fusion and temporal context exploitation for semantic video analysis

被引:0
|
作者
Georgios Th Papadopoulos
Vasileios Mezaris
Ioannis Kompatsiaris
Michael G. Strintzis
机构
[1] CERTH/Informatics and Telematics Institute,
[2] Electrical and Computer Engineering Department of Aristotle University of Thessaloniki,undefined
关键词
Video analysis; multi-modal analysis; temporal context; motion energy; Hidden Markov Models; Bayesian Network;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a multi-modal context-aware approach to semantic video analysis is presented. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for each modality. Subsequently, a graphical modeling-based approach is proposed for jointly performing modality fusion and temporal context exploitation. Novelties of this work include the combined use of contextual information and multi-modal fusion, and the development of a new representation for providing motion distribution information to HMMs. Specifically, an integrated Bayesian Network is introduced for simultaneously performing information fusion of the individual modality analysis results and exploitation of temporal context, contrary to the usual practice of performing each task separately. Contextual information is in the form of temporal relations among the supported classes. Additionally, a new computationally efficient method for providing motion energy distribution-related information to HMMs, which supports the incorporation of motion characteristics from previous frames to the currently examined one, is presented. The final outcome of this overall video analysis framework is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach to four datasets belonging to the domains of tennis, news and volleyball broadcast video are presented.
引用
收藏
相关论文
共 50 条
  • [31] Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition
    Crispim-Junior, Carlos F.
    Buso, Vincent
    Avgerinakis, Konstantinos
    Meditskos, Georgios
    Briassouli, Alexia
    Benois-Pineau, Jenny
    Kompatsiaris, Ioannis
    Bremond, Francois
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) : 1598 - 1611
  • [32] Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism
    Zhao, Lianfen
    Pan, Zhengjun
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 381 - 386
  • [33] MASNet: Road Semantic Segmentation Based on Multiscale Modality Fusion Perception
    Li, Xiaohang
    Zhou, Jianjiang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 13
  • [34] Semantic context based refinement for news video annotation
    Wang, Zhiyong
    Guan, Genliang
    Qiu, Yu
    Zhuo, Li
    Feng, Dagan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2013, 67 (03) : 607 - 627
  • [35] Reducing Semantic Gap in Video Retrieval with Fusion: A survey
    Sudha, D.
    Priyadarshini, J.
    BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 496 - 502
  • [36] Dual Semantic Fusion Network for Video Object Detection
    Lin, Lijian
    Chen, Haosheng
    Zhang, Honglun
    Liang, Jun
    Li, Yu
    Shan, Ying
    Wang, Hanzi
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1855 - 1863
  • [37] MTSCANet: Multi temporal resolution temporal semantic context aggregation network
    Zhang, Haiping
    Ma, Conghao
    Yu, Dongjin
    Guan, Liming
    Wang, Dongjing
    Hu, Zepeng
    Liu, Xu
    IET COMPUTER VISION, 2023, 17 (03) : 366 - 378
  • [38] Semantic context based refinement for news video annotation
    Zhiyong Wang
    Genliang Guan
    Yu Qiu
    Li Zhuo
    Dagan Feng
    Multimedia Tools and Applications, 2013, 67 : 607 - 627
  • [39] The Analysis and Annotation of Semantic Modality for Chinese Words
    Zhang, Shen
    Jia, Jia
    Wang, Xiaohui
    Cai, Lianhong
    11TH CHINESE LEXICAL SEMANTICS WORKSHOP (CKSW2010), 2010, : 143 - 150
  • [40] Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset
    Haddad S.
    Daassi O.
    Belghith S.
    SN Computer Science, 5 (6)