Learning cross-modal appearance models with application to tracking

被引:0
|
作者
Fisher, JW [1 ]
Darrell, T [1 ]
机构
[1] MIT, Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objects of interest are rarely silent or invisible. Analysis of multimodal signal generation from a single object represents a rich and challenging area for smart sensor arrays. We consider the problem of simultaneously learning and audio and visual appearance model of a moving subject. We present a method which successfully learns such a model without benefit of hand initialization using only the associated audio signal to "decide" which object to model and track. We are interested in particular in modeling joint audio and video variation, such as produced by a speaking face. We present an algorithm and experimental results of a human speaker moving in a scene.
引用
收藏
页码:13 / 16
页数:4
相关论文
共 50 条
  • [11] Cross-Modal Concept Learning and Inference for Vision-Language Models
    Zhang, Yi
    Zhang, Ce
    Tang, Yushun
    He, Zhihai
    NEUROCOMPUTING, 2024, 583
  • [12] Cross-Modal Learning with Adversarial Samples
    Li, Chao
    Deng, Cheng
    Gao, Shangqian
    Xie, De
    Liu, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [13] Auditory and cross-modal implicit learning
    Green, CD
    Groff, P
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 15442 - 15442
  • [14] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
  • [15] Cross-Modal Discrete Representation Learning
    Liu, Alexander H.
    Jin, SouYoung
    Lai, Cheng-I Jeff
    Rouditchenko, Andrew
    Oliva, Aude
    Glass, James
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3013 - 3035
  • [16] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [17] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [18] Cross-modal Target Retrieval for Tracking by Natural Language
    Li, Yihao
    Yu, Jun
    Cai, Zhongpeng
    Pan, Yuwen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4927 - 4936
  • [19] Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking
    Sun, Jingxian
    Zhang, Lichao
    Zha, Yufei
    Gonzalez-Garcia, Abel
    Zhang, Peng
    Huang, Wei
    Zhang, Yanning
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2262 - 2270
  • [20] Prototype-based cross-modal object tracking
    Liu, Lei
    Li, Chenglong
    Wang, Futian
    Shen, Longfeng
    Tang, Jin
    INFORMATION FUSION, 2025, 118