共 50 条
- [22] Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5671 - 5672
- [25] Audio-Visual Grouping Network for Sound Localization from Mixtures 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10565 - 10574
- [26] A Closer Look at Weakly-Supervised Audio-Visual Source Localization ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [27] Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1456 - 1463
- [28] Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3884 - 3892
- [29] Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4491 - 4503