共 50 条
- [1] A PRE-TRAINED AUDIO-VISUAL TRANSFORMER FOR EMOTION RECOGNITION [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4698 - 4702
- [2] PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification [J]. INTERSPEECH 2022, 2022, : 1431 - 1435
- [3] SELF-SUPERVISED LEARNING FOR AUDIO-VISUAL SPEAKER DIARIZATION [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4367 - 4371
- [5] Speaker Diarization based on Audio-Visual Integration for Smart Posterboard [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
- [7] AVA-AVD: Audio-Visual Speaker Diarization in the Wild [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3838 - 3847
- [9] Audio-visual speaker diarization using fisher linear semi-discriminant analysis [J]. Multimedia Tools and Applications, 2016, 75 : 115 - 130
- [10] DyViSE: Dynamic Vision-Guided Speaker Embedding for Audio-Visual Speaker Diarization [J]. 2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,