Online Cross-Modal Adaptation for Audio-Visual Person Identification With Wearable Cameras

被引:13
|
作者
Brutti, Alessio [1 ]
Cavallaro, Andrea [1 ]
机构
[1] Queen Mary Univ London, Ctr Intelligent Sensing, London E1 4NS, England
关键词
Model adaptation; multimedia systems; person identification; wearable cameras; FACE RECOGNITION; SPEECH; FUSION; REIDENTIFICATION; VIDEO;
D O I
10.1109/THMS.2016.2620110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an audio-visual target identification approach for egocentric data with cross-modal model adaptation. The proposed approach blindly and iteratively adapts the time-dependent models of each modality to varying target appearance and environmental conditions using the posterior of the other modality. The adaptation is unsupervised and performed online; thus, models can be improved as new unlabeled data become available. In particular, accurate models do not deteriorate when a modality is underperforming thanks to an appropriate selection of the parameters in the adaptation. Importantly, unlike traditional audio-visual integration methods, the proposed approach is also useful for temporal intervals during which only one modality is available or when different modalities are used for different tasks. We evaluate the proposed method in an end-to-end multimodal person identification application with two challenging real-world datasets and show that the proposed approach successfully adapts models in presence of mild mismatch. We also show that the proposed approach is beneficial to other multimodal score fusion algorithms.
引用
收藏
页码:40 / 51
页数:12
相关论文
共 50 条
  • [31] Attribute-Guided Cross-Modal Interaction and Enhancement for Audio-Visual Matching
    Wang, Jiaxiang
    Zheng, Aihua
    Yan, Yan
    He, Ran
    Tang, Jin
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4986 - 4998
  • [32] Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
    Hu, Yuchen
    Li, Ruizhe
    Chen, Chen
    Zou, Heqing
    Zhu, Qiushi
    Chng, Eng Siong
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5076 - 5084
  • [33] Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond
    Li, Jiahong
    Li, Chenda
    Wu, Yifei
    Qian, Yanmin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1941 - 1953
  • [34] Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization
    Xuan, Hanyu
    Zhang, Zhenyu
    Chen, Shuo
    Yang, Jian
    Yan, Yan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 279 - 286
  • [35] Temporal and Cross-modal Attention for Audio-Visual Zero-Shot Learning
    Mercea, Otniel-Bogdan
    Hummel, Thomas
    Koepke, A. Sophia
    Akata, Zeynep
    COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 488 - 505
  • [36] Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal Retrieval
    Zeng, Donghuo
    Wang, Yanan
    Wu, Jianming
    Ikeda, Kazushi
    2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2022, : 1 - 9
  • [37] Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
    Sharma, Rahul
    Narayanan, Shrikanth
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2023, 4 : 225 - 232
  • [38] Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
    Mercea, Otniel-Bogdan
    Riesch, Lukas
    Koepke, A. Sophia
    Akata, Zeynep
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10543 - 10553
  • [39] Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration
    Stropahl, Maren
    Debener, Stefan
    NEUROIMAGE-CLINICAL, 2017, 16 : 514 - 523
  • [40] Learning Explicit and Implicit Dual Common Subspaces for Audio-visual Cross-modal Retrieval
    Zeng, Donghuo
    Wu, Jianming
    Hattori, Gen
    Xu, Rong
    Yu, Yi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)