Joint modality fusion and temporal context exploitation for semantic video analysis

被引:0
|
作者
Georgios Th Papadopoulos
Vasileios Mezaris
Ioannis Kompatsiaris
Michael G. Strintzis
机构
[1] CERTH/Informatics and Telematics Institute,
[2] Electrical and Computer Engineering Department of Aristotle University of Thessaloniki,undefined
关键词
Video analysis; multi-modal analysis; temporal context; motion energy; Hidden Markov Models; Bayesian Network;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a multi-modal context-aware approach to semantic video analysis is presented. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for each modality. Subsequently, a graphical modeling-based approach is proposed for jointly performing modality fusion and temporal context exploitation. Novelties of this work include the combined use of contextual information and multi-modal fusion, and the development of a new representation for providing motion distribution information to HMMs. Specifically, an integrated Bayesian Network is introduced for simultaneously performing information fusion of the individual modality analysis results and exploitation of temporal context, contrary to the usual practice of performing each task separately. Contextual information is in the form of temporal relations among the supported classes. Additionally, a new computationally efficient method for providing motion energy distribution-related information to HMMs, which supports the incorporation of motion characteristics from previous frames to the currently examined one, is presented. The final outcome of this overall video analysis framework is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach to four datasets belonging to the domains of tennis, news and volleyball broadcast video are presented.
引用
收藏
相关论文
共 50 条
  • [41] Exploiting Semantic and Visual Context for Effective Video Annotation
    Yi, Jian
    Peng, Yuxin
    Xiao, Jianguo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (06) : 1400 - 1414
  • [42] A novel fusion method for semantic concept classification in video
    Tan, Li
    Cao, Yuanda
    Yang, Minghua
    Yu, Jiong
    Journal of Software, 2009, 4 (09): : 968 - 975
  • [43] Hybrid Semantic Concept Temporal Pooling for Large-Scale Video Event Analysis
    LIU Wu
    MA Huadong
    ChineseJournalofElectronics, 2017, 26 (06) : 1125 - 1131
  • [44] Hybrid Semantic Concept Temporal Pooling for Large-Scale Video Event Analysis
    Liu Wu
    Ma Huadong
    CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (06) : 1125 - 1131
  • [45] Structural and semantic analysis of video
    Chang, SF
    Sundaram, H
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 687 - 690
  • [46] Video Semantic Segmentation via Sparse Temporal Transformer
    Li, Jiangtong
    Wang, Wentao
    Chen, Junjie
    Niu, Li
    Si, Jianlou
    Qian, Chen
    Zhang, Liqing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 59 - 68
  • [47] Uncovering the Temporal Context for Video Question Answering
    Linchao Zhu
    Zhongwen Xu
    Yi Yang
    Alexander G. Hauptmann
    International Journal of Computer Vision, 2017, 124 : 409 - 421
  • [48] HiSA: Hierarchically Semantic Associating for Video Temporal Grounding
    Xu, Zhe
    Chen, Da
    Wei, Kun
    Deng, Cheng
    Xue, Hui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5178 - 5188
  • [49] Unsupervised Temporal Video Grounding with Deep Semantic Clustering
    Liu, Daizong
    Qu, Xiaoye
    Wang, Yinzhen
    Di, Xing
    Zou, Kai
    Cheng, Yu
    Xu, Zichuan
    Zhou, Pan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1683 - 1691
  • [50] Recurrent Temporal Deep Field for Semantic Video Labeling
    Lei, Peng
    Todorovic, Sinisa
    COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 302 - 317