Online Cross-Modal Adaptation for Audio-Visual Person Identification With Wearable Cameras

被引：13

作者：

Brutti, Alessio ^{[1
]}

Cavallaro, Andrea ^{[1
]}

机构：

[1] Queen Mary Univ London, Ctr Intelligent Sensing, London E1 4NS, England

来源：

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS | 2017年 / 47卷 / 01期

关键词：

Model adaptation; multimedia systems; person identification; wearable cameras; FACE RECOGNITION; SPEECH; FUSION; REIDENTIFICATION; VIDEO;

D O I：

10.1109/THMS.2016.2620110

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose an audio-visual target identification approach for egocentric data with cross-modal model adaptation. The proposed approach blindly and iteratively adapts the time-dependent models of each modality to varying target appearance and environmental conditions using the posterior of the other modality. The adaptation is unsupervised and performed online; thus, models can be improved as new unlabeled data become available. In particular, accurate models do not deteriorate when a modality is underperforming thanks to an appropriate selection of the parameters in the adaptation. Importantly, unlike traditional audio-visual integration methods, the proposed approach is also useful for temporal intervals during which only one modality is available or when different modalities are used for different tasks. We evaluate the proposed method in an end-to-end multimodal person identification application with two challenging real-world datasets and show that the proposed approach successfully adapts models in presence of mild mismatch. We also show that the proposed approach is beneficial to other multimodal score fusion algorithms.

引用

页码：40 / 51

页数：12

共 50 条

[1] Unsupervised cross-modal deep-model adaptation for audio-visual re-identification with wearable cameras
Brutti, Alessio
Cavallaro, Andrea
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 438 - 445
[2] Deep Cross-Modal Audio-Visual Generation
Chen, Lele
Srivastava, Sudhanshu
Duan, Zhiyao
Xu, Chenliang
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 349 - 357
[3] Cross-modal prediction in audio-visual communication
Rao, RR
Chen, TH
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 2056 - 2059
[4] Cross-Modal Analysis of Audio-Visual Film Montage
Zeppelzauer, Matthias
Mitrovic, Dalibor
Breiteneder, Christian
2011 20TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN), 2011,
[5] Audio-Visual Instance Discrimination with Cross-Modal Agreement
Morgado, Pedro
Vasconcelos, Nuno
Misra, Ishan
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12470 - 12481
[6] Cross-Modal learning for Audio-Visual Video Parsing
Lamba, Jatin
Abhishek
Akula, Jayaprakash
Dabral, Rishabh
Jyothi, Preethi
Ramakrishnan, Ganesh
INTERSPEECH 2021, 2021, : 1937 - 1941
[7] Variational Autoencoder with CCA for Audio-Visual Cross-modal Retrieval
Zhang, Jiwei
Yu, Yi
Tang, Suhua
Wu, Jianming
Li, Wei
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
[8] Temporal Cross-Modal Attention for Audio-Visual Event Localization
Nagasaki Y.
Hayashi M.
Kaneko N.
Aoki Y.
Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268
[9] Effect of Uncertainty in Audio-Visual Cross-Modal Statistical Learning
Nagy, Marton
Reguly, Helga
Markus, Benjamin
Fiser, Jozsef
PERCEPTION, 2019, 48 : 109 - 109
[10] Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
Liu, Chen
Li, Peike Patrick
Qi, Xingqun
Zhang, Hu
Li, Lincheng
Wang, Dadong
Yu, Xin
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7590 - 7598

← 1 2 3 4 5 →