Online Cross-Modal Adaptation for Audio-Visual Person Identification With Wearable Cameras

被引：13

作者：

Brutti, Alessio ^{[1
]}

Cavallaro, Andrea ^{[1
]}

机构：

[1] Queen Mary Univ London, Ctr Intelligent Sensing, London E1 4NS, England

来源：

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS | 2017年 / 47卷 / 01期

关键词：

Model adaptation; multimedia systems; person identification; wearable cameras; FACE RECOGNITION; SPEECH; FUSION; REIDENTIFICATION; VIDEO;

D O I：

10.1109/THMS.2016.2620110

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose an audio-visual target identification approach for egocentric data with cross-modal model adaptation. The proposed approach blindly and iteratively adapts the time-dependent models of each modality to varying target appearance and environmental conditions using the posterior of the other modality. The adaptation is unsupervised and performed online; thus, models can be improved as new unlabeled data become available. In particular, accurate models do not deteriorate when a modality is underperforming thanks to an appropriate selection of the parameters in the adaptation. Importantly, unlike traditional audio-visual integration methods, the proposed approach is also useful for temporal intervals during which only one modality is available or when different modalities are used for different tasks. We evaluate the proposed method in an end-to-end multimodal person identification application with two challenging real-world datasets and show that the proposed approach successfully adapts models in presence of mild mismatch. We also show that the proposed approach is beneficial to other multimodal score fusion algorithms.

引用

页码：40 / 51

页数：12

共 50 条

[41] Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition
Takashima, Akihiko
Masumura, Ryo
Ando, Atsushi
Yamazaki, Yoshihiro
Uchida, Mihiro
Orihashi, Shota
INTERSPEECH 2022, 2022, : 4740 - 4744
[42] Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Lee, Jiyoung
Chung, Soo-Whan
Kim, Sunok
Kang, Hong-Goo
Sohn, Kwanghoon
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1336 - 1345
[43] Modeling implicit learning in a cross-modal audio-visual serial reaction time task
Taesler, Philipp
Jablonowski, Julia
Fu, Qiufang
Rose, Michael
COGNITIVE SYSTEMS RESEARCH, 2019, 54 : 154 - 164
[44] Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
Sarkar, Pritam
Etemad, Ali
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9723 - 9732
[45] IMPROVING AUDIO-VISUAL SPEECH RECOGNITION PERFORMANCE WITH CROSS-MODAL STUDENT-TEACHER TRAINING
Li, Wei
Wang, Sicheng
Lei, Ming
Siniscalchi, Sabato Marco
Lee, Chin-Hui
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6560 - 6564
[46] Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-Modal Retrieval
Zeng, Donghuo
Yu, Yi
Oyama, Keizo
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
[47] Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition
Praveen, R. Gnana
Alam, Jahangir
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (03) : 444 - 458
[48] Audio-Visual Embedding for Cross-Modal Music Video Retrieval through Supervised Deep CCA
Zeng, Donghuo
Yu, Yi
Oyama, Keizo
2018 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2018), 2018, : 143 - 150
[49] Maintaining frame rate perception in interactive environments by exploiting audio-visual cross-modal interaction
Hulusic, Vedad
Debattista, Kurt
Aggarwal, Vibhor
Chalmers, Alan
VISUAL COMPUTER, 2011, 27 (01): : 57 - 66
[50] Audio-Visual Cross-Modal Correspondences of Perceived Urgency: Examination through a Speeded Discrimination Task
Naka, Kiichi
Yamauchi, Katsuya
MULTISENSORY RESEARCH, 2023, 36 (05) : 413 - 428

← 1 2 3 4 5 →