共 50 条
- [1] Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1456 - 1463
- [2] Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation INTERSPEECH 2022, 2022, : 886 - 890
- [5] Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4491 - 4503
- [6] SELF-SUPERVISED AUDIO-VISUAL CO-SEGMENTATION 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2357 - 2361
- [7] SELF-SUPERVISED LEARNING FOR AUDIO-VISUAL SPEAKER DIARIZATION 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4367 - 4371
- [8] Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9723 - 9732
- [9] Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [10] Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9912 - 9919