共 50 条
- [41] An iVector Extractor Using Pre-trained Neural Networks for Speaker Verification [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 73 - 77
- [42] A CONDITIONAL RANDOM FIELD APPROACH FOR AUDIO-VISUAL PEOPLE DIARIZATION [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
- [44] Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 702 - 708
- [46] CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2863 - 2874
- [47] Multimodal Emotion Recognition using Physiological and Audio-Visual Features [J]. PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 946 - 951
- [48] Classification of Respiration Sounds Using Deep Pre-trained Audio Embeddings [J]. 2021 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2021,
- [50] Audio-visual speaker identification based on the use of dynamic audio and visual features [J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751