A multi-modal system for the retrieval of semantic video events

被引:13
|
作者
Amir, A
Basu, S
Iyengar, G
Lin, CY
Naphade, M
Smith, JR
Srinivasan, S
Tseng, B
机构
[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
[2] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
multimedia indexing; event detection; semantic video annotation; content-based video retrieval;
D O I
10.1016/j.cviu.2004.02.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A framework for event detection is proposed where events, objects, and other semantic concepts are detected from video using trained classifiers. These classifiers are used to automatically annotate video with semantic labels, which in turn are used to search for new, untrained types of events and semantic concepts. The novelty of the approach lies in the: (1) semi-automatic construction of models of events from feature descriptors and (2) integration of content-based and concept-based querying in the search process. Speech retrieval is independently applied and combined results are produced. Results of applying these to the Search benchmark of the NIST TREC Video track 2001 are reported, and the gained experience and future work are discussed. (C) 2004 Published by Elsevier Inc.
引用
收藏
页码:216 / 236
页数:21
相关论文
共 50 条
  • [41] LCEMH: Label Correlation Enhanced Multi-modal Hashing for efficient multi-modal retrieval
    Zheng, Chaoqun
    Zhu, Lei
    Zhang, Zheng
    Duan, Wenjun
    Lu, Wenpeng
    INFORMATION SCIENCES, 2024, 659
  • [42] Multi-modal fusion for video understanding
    Hoogs, A
    Mundy, J
    Cross, G
    30TH APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP, PROCEEDINGS: ANALYSIS AND UNDERSTANDING OF TIME VARYING IMAGERY, 2001, : 103 - 108
  • [43] Multi-modal Dense Video Captioning
    Iashin, Vladimir
    Rahtu, Esa
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4117 - 4126
  • [44] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
    Wang, Yan
    Zeng, Yawen
    Liang, Junjie
    Xing, Xiaofen
    Xu, Jin
    Xu, Xiangmin
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868
  • [45] UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
    Liu, Ye
    Li, Siyuan
    Wu, Yang
    Chen, Chang Wen
    Shan, Ying
    Qie, Xiaohu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3032 - 3041
  • [46] Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
    Chen, Yizhen
    Wang, Jie
    Lin, Lijian
    Qi, Zhongang
    Ma, Jin
    Shan, Ying
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 396 - 404
  • [47] Temporally Multi-Modal Semantic Reasoning with Spatial Language Constraints for Video Question Answering
    Liu, Mingyang
    Wang, Ruomei
    Zhou, Fan
    Lin, Ge
    SYMMETRY-BASEL, 2022, 14 (06):
  • [48] Personalized retrieval of sports video based on multi-modal analysis and user preference acquisition
    Zhang, Yi-Fan
    Xu, Changsheng
    Zhang, Xiaoyu
    Lu, Hanqing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2009, 44 (02) : 305 - 330
  • [49] Automated Multi-Modal Video Editing for Ads Video
    Lin, Qin
    Pang, Nuo
    Hong, Zhiying
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4823 - 4827
  • [50] Personalized retrieval of sports video based on multi-modal analysis and user preference acquisition
    Yi-Fan Zhang
    Changsheng Xu
    Xiaoyu Zhang
    Hanqing Lu
    Multimedia Tools and Applications, 2009, 44 : 305 - 330