共 50 条
- [1] Unified Speech-Text Pre-training for Speech Translation and Recognition [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1488 - 1499
- [2] MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15689 - 15699
- [3] Joint Speech-Text Embeddings for Multitask Speech Processing [J]. IEEE Access, 2024, 12 : 145955 - 145967
- [4] STEPs-RL: Speech-Text Entanglement for Phonetically Sound Representation Learning [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT III, 2021, 12714 : 55 - 66
- [5] MAESTRO-U: LEVERAGING JOINT SPEECH-TEXT REPRESENTATION LEARNING FOR ZERO SUPERVISED SPEECH ASR [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 68 - 75
- [6] AN ANALYSIS OF SEMANTICALLY-ALIGNED SPEECH-TEXT EMBEDDINGS [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 747 - 754
- [7] Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 24
- [8] Joint Speech-Text Embeddings with Disentangled Speaker Features [J]. 2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
- [9] Self-learning speaker identification for enhanced speech recognition [J]. COMPUTER SPEECH AND LANGUAGE, 2012, 26 (03): : 210 - 227
- [10] Self-learning Vector Quantization for Pattern Discovery from Speech [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 848 - 851