Online Cross-Modal Adaptation for Audio-Visual Person Identification With Wearable Cameras

被引:13
|
作者
Brutti, Alessio [1 ]
Cavallaro, Andrea [1 ]
机构
[1] Queen Mary Univ London, Ctr Intelligent Sensing, London E1 4NS, England
关键词
Model adaptation; multimedia systems; person identification; wearable cameras; FACE RECOGNITION; SPEECH; FUSION; REIDENTIFICATION; VIDEO;
D O I
10.1109/THMS.2016.2620110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an audio-visual target identification approach for egocentric data with cross-modal model adaptation. The proposed approach blindly and iteratively adapts the time-dependent models of each modality to varying target appearance and environmental conditions using the posterior of the other modality. The adaptation is unsupervised and performed online; thus, models can be improved as new unlabeled data become available. In particular, accurate models do not deteriorate when a modality is underperforming thanks to an appropriate selection of the parameters in the adaptation. Importantly, unlike traditional audio-visual integration methods, the proposed approach is also useful for temporal intervals during which only one modality is available or when different modalities are used for different tasks. We evaluate the proposed method in an end-to-end multimodal person identification application with two challenging real-world datasets and show that the proposed approach successfully adapts models in presence of mild mismatch. We also show that the proposed approach is beneficial to other multimodal score fusion algorithms.
引用
收藏
页码:40 / 51
页数:12
相关论文
共 50 条
  • [41] Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition
    Takashima, Akihiko
    Masumura, Ryo
    Ando, Atsushi
    Yamazaki, Yoshihiro
    Uchida, Mihiro
    Orihashi, Shota
    INTERSPEECH 2022, 2022, : 4740 - 4744
  • [42] Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
    Lee, Jiyoung
    Chung, Soo-Whan
    Kim, Sunok
    Kang, Hong-Goo
    Sohn, Kwanghoon
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1336 - 1345
  • [43] Modeling implicit learning in a cross-modal audio-visual serial reaction time task
    Taesler, Philipp
    Jablonowski, Julia
    Fu, Qiufang
    Rose, Michael
    COGNITIVE SYSTEMS RESEARCH, 2019, 54 : 154 - 164
  • [44] Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
    Sarkar, Pritam
    Etemad, Ali
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9723 - 9732
  • [45] IMPROVING AUDIO-VISUAL SPEECH RECOGNITION PERFORMANCE WITH CROSS-MODAL STUDENT-TEACHER TRAINING
    Li, Wei
    Wang, Sicheng
    Lei, Ming
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6560 - 6564
  • [46] Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-Modal Retrieval
    Zeng, Donghuo
    Yu, Yi
    Oyama, Keizo
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
  • [47] Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition
    Praveen, R. Gnana
    Alam, Jahangir
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (03) : 444 - 458
  • [48] Audio-Visual Embedding for Cross-Modal Music Video Retrieval through Supervised Deep CCA
    Zeng, Donghuo
    Yu, Yi
    Oyama, Keizo
    2018 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2018), 2018, : 143 - 150
  • [49] Maintaining frame rate perception in interactive environments by exploiting audio-visual cross-modal interaction
    Hulusic, Vedad
    Debattista, Kurt
    Aggarwal, Vibhor
    Chalmers, Alan
    VISUAL COMPUTER, 2011, 27 (01): : 57 - 66
  • [50] Audio-Visual Cross-Modal Correspondences of Perceived Urgency: Examination through a Speeded Discrimination Task
    Naka, Kiichi
    Yamauchi, Katsuya
    MULTISENSORY RESEARCH, 2023, 36 (05) : 413 - 428