共 50 条
- [2] A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2485 - 2494
- [3] CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 102 - 106
- [4] Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention [J]. IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2023, 5 (03): : 360 - 373
- [5] Dynamic visual features for audio-visual speaker verification [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 136 - 149
- [6] Multi-scale network with shared cross-attention for audio-visual correlation learning [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (27): : 20173 - 20187
- [7] A MULTI-VIEW APPROACH TO AUDIO-VISUAL SPEAKER VERIFICATION [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6194 - 6198
- [8] AUDIO-VISUAL SPEAKER LOCALIZATION VIA WEIGHTED CLUSTERING [J]. 2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,