共 50 条
- [1] Learning Representations from Audio-Visual Spatial Alignment [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
- [2] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
- [3] Indexing audio-visual sequences by joint audio and video processing [J]. VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
- [4] AVLnet: Learning Audio-Visual Language Representations from Instructional Videos [J]. INTERSPEECH 2021, 2021, : 1584 - 1588
- [5] Audio-Visual Biometric Recognition Via Joint Sparse Representations [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3031 - 3035
- [6] Identification of story units in audio-visual sequences by joint audio and video processing [J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 363 - 367
- [9] A JOINT AUDIO-VISUAL APPROACH TO AUDIO LOCALIZATION [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 454 - 458