Online Cross-Modal Adaptation for Audio-Visual Person Identification With Wearable Cameras

被引:13
|
作者
Brutti, Alessio [1 ]
Cavallaro, Andrea [1 ]
机构
[1] Queen Mary Univ London, Ctr Intelligent Sensing, London E1 4NS, England
关键词
Model adaptation; multimedia systems; person identification; wearable cameras; FACE RECOGNITION; SPEECH; FUSION; REIDENTIFICATION; VIDEO;
D O I
10.1109/THMS.2016.2620110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an audio-visual target identification approach for egocentric data with cross-modal model adaptation. The proposed approach blindly and iteratively adapts the time-dependent models of each modality to varying target appearance and environmental conditions using the posterior of the other modality. The adaptation is unsupervised and performed online; thus, models can be improved as new unlabeled data become available. In particular, accurate models do not deteriorate when a modality is underperforming thanks to an appropriate selection of the parameters in the adaptation. Importantly, unlike traditional audio-visual integration methods, the proposed approach is also useful for temporal intervals during which only one modality is available or when different modalities are used for different tasks. We evaluate the proposed method in an end-to-end multimodal person identification application with two challenging real-world datasets and show that the proposed approach successfully adapts models in presence of mild mismatch. We also show that the proposed approach is beneficial to other multimodal score fusion algorithms.
引用
收藏
页码:40 / 51
页数:12
相关论文
共 50 条
  • [1] Unsupervised cross-modal deep-model adaptation for audio-visual re-identification with wearable cameras
    Brutti, Alessio
    Cavallaro, Andrea
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 438 - 445
  • [2] Deep Cross-Modal Audio-Visual Generation
    Chen, Lele
    Srivastava, Sudhanshu
    Duan, Zhiyao
    Xu, Chenliang
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 349 - 357
  • [3] Cross-modal prediction in audio-visual communication
    Rao, RR
    Chen, TH
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 2056 - 2059
  • [4] Cross-Modal Analysis of Audio-Visual Film Montage
    Zeppelzauer, Matthias
    Mitrovic, Dalibor
    Breiteneder, Christian
    2011 20TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN), 2011,
  • [5] Audio-Visual Instance Discrimination with Cross-Modal Agreement
    Morgado, Pedro
    Vasconcelos, Nuno
    Misra, Ishan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12470 - 12481
  • [6] Cross-Modal learning for Audio-Visual Video Parsing
    Lamba, Jatin
    Abhishek
    Akula, Jayaprakash
    Dabral, Rishabh
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    INTERSPEECH 2021, 2021, : 1937 - 1941
  • [7] Variational Autoencoder with CCA for Audio-Visual Cross-modal Retrieval
    Zhang, Jiwei
    Yu, Yi
    Tang, Suhua
    Wu, Jianming
    Li, Wei
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [8] Temporal Cross-Modal Attention for Audio-Visual Event Localization
    Nagasaki Y.
    Hayashi M.
    Kaneko N.
    Aoki Y.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268
  • [9] Effect of Uncertainty in Audio-Visual Cross-Modal Statistical Learning
    Nagy, Marton
    Reguly, Helga
    Markus, Benjamin
    Fiser, Jozsef
    PERCEPTION, 2019, 48 : 109 - 109
  • [10] Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
    Liu, Chen
    Li, Peike Patrick
    Qi, Xingqun
    Zhang, Hu
    Li, Lincheng
    Wang, Dadong
    Yu, Xin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7590 - 7598