共 50 条
- [1] End-to-end audio-visual speech recognition for overlapping speech [J]. INTERSPEECH 2021, 2021, : 3016 - 3020
- [2] END-TO-END AUDIO-VISUAL SPEECH RECOGNITION WITH CONFORMERS [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7613 - 7617
- [3] FUSING INFORMATION STREAMS IN END-TO-END AUDIO-VISUAL SPEECH RECOGNITION [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3430 - 3434
- [4] Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition [J]. INTERSPEECH 2019, 2019, : 4090 - 4094
- [5] Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition [J]. INTERSPEECH 2022, 2022, : 2838 - 2842
- [7] End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion [J]. PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 501 - 510
- [9] END-TO-END MULTI-PERSON AUDIO/VISUAL AUTOMATIC SPEECH RECOGNITION [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6994 - 6998
- [10] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670