共 50 条
- [1] Audio-Visual Speaker Verification via Joint Cross-Attention [J]. SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 18 - 31
- [2] Multi-scale network with shared cross-attention for audio-visual correlation learning [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (27): : 20173 - 20187
- [3] CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 102 - 106
- [4] A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2485 - 2494
- [5] Audio-visual Speaker Recognition with a Cross-modal Discriminative Network [J]. INTERSPEECH 2020, 2020, : 2242 - 2246
- [6] Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1456 - 1463
- [7] Audio-visual speaker tracking with importance particle filters [J]. 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 25 - 28
- [8] Audio-Visual Salieny Network with Audio Attention Module [J]. PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
- [9] Neural Speaker Extraction with Speaker-Speech Cross-Attention Network [J]. INTERSPEECH 2021, 2021, : 3535 - 3539
- [10] Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise [J]. PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 215 - 226