Concept-Driven Multi-Modality Fusion for Video Search

被引:30
|
作者
Wei, Xiao-Yong [1 ,2 ,4 ]
Jiang, Yu-Gang [1 ,3 ]
Ngo, Chong-Wah [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[2] Sichuan Univ, Sch Comp Sci, Chengdu 610054, Peoples R China
[3] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
[4] City Univ Hong Kong, Dept Chinese Linguist & Translat, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Concept-driven fusion; multi-modality; semantic concept; video search; DETECTORS; ONTOLOGY;
D O I
10.1109/TCSVT.2011.2105597
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As it is true for human perception that we gather information from different sources in natural and multi-modality forms, learning from multi-modalities has become an effective scheme for various information retrieval problems. In this paper, we propose a novel multi-modality fusion approach for video search, where the search modalities are derived from a diverse set of knowledge sources, such as text transcript from speech recognition, low-level visual features from video frames, and high-level semantic visual concepts from supervised learning. Since the effectiveness of each search modality greatly depends on specific user queries, prompt determination of the importance of a modality to a user query is a critical issue in multi-modality search. Our proposed approach, named concept-driven multimodality fusion (CDMF), explores a large set of predefined semantic concepts for computing multi-modality fusion weights in a novel way. Specifically, in CDMF, we decompose the query-modality relationship into two components that are much easier to compute: query-concept relatedness and concept-modality relevancy. The former can be efficiently estimated online using semantic and visual mapping techniques, while the latter can be computed offline based on concept detection accuracy of each modality. Such a decomposition facilitates the need of adaptive learning of fusion weights for each user query on-the-fly, in contrast to the existing approaches which mostly adopted predefined query classes and/or modality weights. Experimental results on TREC video-retrieval evaluation 2005-2008 dataset validate the effectiveness of our approach, which outperforms the existing multi-modality fusion methods and achieves near-optimal performance (from oracle fusion) for many test queries.
引用
收藏
页码:62 / 73
页数:12
相关论文
共 50 条
  • [1] Multi-concept multi-modality active learning for interactive video annotation
    Wang, Meng
    Hua, Xian-Sheng
    Song, Yan
    Tang, Jinhui
    Dai, Li-Rong
    [J]. ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 321 - +
  • [2] INTERACTIVE VIDEO ANNOTATION BY MULTI-CONCEPT MULTI-MODALITY ACTIVE LEARNING
    Wang, Meng
    Hua, Xian-Sheng
    Mei, Tao
    Tang, Jinhui
    Qi, Guo-Jun
    Song, Yan
    Dai, Li-Rong
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2007, 1 (04) : 459 - 477
  • [3] A concept-driven algorithm for clustering search results
    Osinski, S
    Weiss, D
    [J]. IEEE INTELLIGENT SYSTEMS, 2005, 20 (03) : 48 - 54
  • [4] Video semantic concept detection using multi-modality subspace correlation propagation
    Liu, Yanan
    Wu, Fei
    [J]. ADVANCES IN MULTIMEDIA MODELING, PT 1, 2007, 4351 : 527 - 534
  • [5] Exploring multi-modality structure for cross domain adaptation in video concept annotation
    Xu, Shaoxi
    Tang, Sheng
    Zhang, Yongdong
    Li, Jintao
    Zheng, Yan-Tao
    [J]. NEUROCOMPUTING, 2012, 95 : 11 - 21
  • [6] An effective video retrieval approach based on multi-modality concept correlation graph
    Feng, Bailan
    Bao, Lei
    Cao, Juan
    Zhang, Yongdong
    Lin, Shouxun
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2010, 22 (05): : 827 - 832
  • [7] Efficient video copy detection using multi-modality and dynamic path search
    Teng Li
    Fudong Nian
    Xinyu Wu
    Qingwei Gao
    Yixiang Lu
    [J]. Multimedia Systems, 2016, 22 : 29 - 39
  • [8] Efficient video copy detection using multi-modality and dynamic path search
    Li, Teng
    Nian, Fudong
    Wu, Xinyu
    Gao, Qingwei
    Lu, Yixiang
    [J]. MULTIMEDIA SYSTEMS, 2016, 22 (01) : 29 - 39
  • [9] Concept-Driven Sociology
    Zerubavel, Eviatar
    [J]. SYMBOLIC INTERACTION, 2024, 47 (02) : 123 - 141
  • [10] Learning Concept-Driven Document Embeddings for Medical Information Search
    Gia-Hung Nguyen
    Tamine, Lynda
    Soulier, Laure
    Souf, Nathalie
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2017, 2017, 10259 : 160 - 170