A multi-modal system for the retrieval of semantic video events

被引：13

作者：

Amir, A

Basu, S

Iyengar, G

Lin, CY

Naphade, M

Smith, JR

Srinivasan, S

Tseng, B

机构：

[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA

[2] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA

[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2004年 / 96卷 / 02期

关键词：

multimedia indexing; event detection; semantic video annotation; content-based video retrieval;

D O I：

10.1016/j.cviu.2004.02.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A framework for event detection is proposed where events, objects, and other semantic concepts are detected from video using trained classifiers. These classifiers are used to automatically annotate video with semantic labels, which in turn are used to search for new, untrained types of events and semantic concepts. The novelty of the approach lies in the: (1) semi-automatic construction of models of events from feature descriptors and (2) integration of content-based and concept-based querying in the search process. Speech retrieval is independently applied and combined results are produced. Results of applying these to the Search benchmark of the NIST TREC Video track 2001 are reported, and the gained experience and future work are discussed. (C) 2004 Published by Elsevier Inc.

引用

页码：216 / 236

页数：21

共 50 条

[41] LCEMH: Label Correlation Enhanced Multi-modal Hashing for efficient multi-modal retrieval
Zheng, Chaoqun
Zhu, Lei
Zhang, Zheng
Duan, Wenjun
Lu, Wenpeng
INFORMATION SCIENCES, 2024, 659
[42] Multi-modal fusion for video understanding
Hoogs, A
Mundy, J
Cross, G
30TH APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP, PROCEEDINGS: ANALYSIS AND UNDERSTANDING OF TIME VARYING IMAGERY, 2001, : 103 - 108
[43] Multi-modal Dense Video Captioning
Iashin, Vladimir
Rahtu, Esa
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4117 - 4126
[44] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
Wang, Yan
Zeng, Yawen
Liang, Junjie
Xing, Xiaofen
Xu, Jin
Xu, Xiangmin
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868
[45] UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Liu, Ye
Li, Siyuan
Wu, Yang
Chen, Chang Wen
Shan, Ying
Qie, Xiaohu
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3032 - 3041
[46] Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Chen, Yizhen
Wang, Jie
Lin, Lijian
Qi, Zhongang
Ma, Jin
Shan, Ying
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 396 - 404
[47] Temporally Multi-Modal Semantic Reasoning with Spatial Language Constraints for Video Question Answering
Liu, Mingyang
Wang, Ruomei
Zhou, Fan
Lin, Ge
SYMMETRY-BASEL, 2022, 14 (06):
[48] Personalized retrieval of sports video based on multi-modal analysis and user preference acquisition
Zhang, Yi-Fan
Xu, Changsheng
Zhang, Xiaoyu
Lu, Hanqing
MULTIMEDIA TOOLS AND APPLICATIONS, 2009, 44 (02) : 305 - 330
[49] Automated Multi-Modal Video Editing for Ads Video
Lin, Qin
Pang, Nuo
Hong, Zhiying
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4823 - 4827
[50] Personalized retrieval of sports video based on multi-modal analysis and user preference acquisition
Yi-Fan Zhang
Changsheng Xu
Xiaoyu Zhang
Hanqing Lu
Multimedia Tools and Applications, 2009, 44 : 305 - 330

← 1 2 3 4 5 →