共 50 条
- [31] An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement [J]. MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 722 - 728
- [32] Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2120 - 2124
- [34] A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6863 - 6867
- [37] Building a data corpus for audio-visual speech recognition [J]. EUROMEDIA '2007, 2007, : 88 - 92
- [38] Audio-visual fuzzy fusion for robust speech recognition [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
- [40] Audio-Visual Automatic Speech Recognition for Connected Digits [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +