共 50 条
- [41] Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition [J]. INTERSPEECH 2022, 2022, : 4740 - 4744
- [42] Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1336 - 1345
- [44] Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9723 - 9732
- [45] Learning Event-Specific Localization Preferences for Audio-Visual Event Localization [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3446 - 3454
- [46] Audio-Visual Embedding for Cross-Modal Music Video Retrieval through Supervised Deep CCA [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2018), 2018, : 143 - 150
- [48] IMPROVING AUDIO-VISUAL SPEECH RECOGNITION PERFORMANCE WITH CROSS-MODAL STUDENT-TEACHER TRAINING [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6560 - 6564