共 50 条
- [1] Cascaded Multilingual Audio-Visual Learning from Videos [J]. INTERSPEECH 2021, 2021, : 3006 - 3010
- [2] Learning Representations from Audio-Visual Spatial Alignment [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
- [3] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
- [4] Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14866 - 14876
- [5] SpeechIndexer: A Flexible Software for Audio-Visual Language Learning [J]. ICEIC 2011/ IRE&PS 2011: INTERNATIONAL CONFERENCE ON EDUCATION, INFORMATICS, AND CYBERNETICS/ INTERNATIONAL SYMPOSIUM ON INTEGRATING RESEARCH, EDUCATION, AND PROBLEM SOLVING, 2011, : 79 - 82
- [7] Audio-Visual Event Localization in Unconstrained Videos [J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
- [8] Support system for making audio-visual material for learning language [J]. 2006 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY BASED HIGHER EDUCATION AND TRAINING, VOLS 1 AND 2, 2006, : 199 - 202
- [9] Learning Self-supervised Audio-Visual Representations for Sound Recommendations [J]. ADVANCES IN VISUAL COMPUTING (ISVC 2021), PT II, 2021, 13018 : 124 - 138
- [10] Learning Better Representations for Audio-Visual Emotion Recognition with Common Information [J]. APPLIED SCIENCES-BASEL, 2020, 10 (20): : 1 - 23