Multiple kernel visual-auditory representation learning for retrieval

被引:9
|
作者
Zhang, Hong [1 ,2 ]
Zhang, Wenping [1 ]
Liu, Wenhe [3 ]
Xu, Xin [1 ]
Fan, Hehe [4 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430081, Peoples R China
[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Peoples R China
[3] Univ Technol Sydney UTS, Ctr Quantum Computat & Intelligent Syst, Sydney, NSW, Australia
[4] Baidu, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multiple kernel learning; Visual-auditory data representation; Cross-media retrieval; INFORMATION-RETRIEVAL; CANONICAL CORRELATION; FEATURE-SELECTION;
D O I
10.1007/s11042-016-3294-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-media data representation, which focuses on semantics understanding of multimedia data in different modalities, is a rising hot topic in web media data analysis. The most challenging issues for cross-media data representation include: how to find underlying content-level data correlations and how to use such correlations in the representation model. Most traditional web media data analysis works are based on single modality data sources, such as Flickr images or YouTube videos, leaving cross-media data representation and semantics understanding wide open. In this paper, we propose a multiple kernel visual-auditory representation learning approach, which learns cross-media correlations from visual and auditory feature spaces with multiple kernel strategies. Besides, we give cross-media distance measure for image-audio retrieval in the mutual subspace of co-occurrence. Experiment results on the collected image-audio database are encouraging, and show that the performance of our approach is effective from multiple perspectives.
引用
收藏
页码:9169 / 9184
页数:16
相关论文
共 50 条
  • [1] Multiple kernel visual-auditory representation learning for retrieval
    Hong Zhang
    Wenping Zhang
    Wenhe Liu
    Xin Xu
    Hehe Fan
    [J]. Multimedia Tools and Applications, 2016, 75 : 9169 - 9184
  • [2] Memory for visual, auditory and visual-auditory material
    不详
    [J]. ANNEE PSYCHOLOGIQUE, 1936, 37 : 655 - 656
  • [3] A common visual-auditory representation for eye and arm movement control?
    Arndt, PA
    [J]. PERCEPTION, 2003, 32 : 8 - 8
  • [4] Implicit perceptual learning of visual-auditory modality sequences
    Koch, Iring
    Blotenberg, Iris
    Fedosejew, Viktoria
    Stephan, Denise N.
    [J]. ACTA PSYCHOLOGICA, 2020, 202
  • [5] Visual-auditory learning network for construction equipment action detection
    Jung, Seunghoon
    Jeoung, Jaewon
    Lee, Dong-Eun
    Jang, Hyounseung
    Hong, Taehoon
    [J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2023, 38 (14) : 1916 - 1934
  • [6] Representation of visual-auditory integration effectiveness in event-related potentials
    Babenko, VV
    Kotova, MJ
    Safina, ZM
    [J]. PERCEPTION, 2005, 34 : 217 - 217
  • [7] VISUAL-AUDITORY DISTANCE CONSTANCY
    ENGEL, GR
    DOUGHERTY, WG
    [J]. NATURE, 1971, 234 (5327) : 308 - +
  • [8] Automatic auditory change detection in humans is influenced by visual-auditory associative learning
    Laine, Matti
    Kwon, Myoung Soo
    Hamalainen, Heikki
    [J]. NEUROREPORT, 2007, 18 (16) : 1697 - 1701
  • [9] VISUAL-AUDITORY DISTANCE CONSTANCY
    DAY, RH
    [J]. NATURE, 1972, 238 (5361) : 227 - &
  • [10] Differential deficits in visual-auditory learning of depressed and demented patients
    Noggle, CA
    Dean, RS
    Finch, WH
    [J]. ARCHIVES OF CLINICAL NEUROPSYCHOLOGY, 2005, 20 (07) : 858 - 858