共 50 条
- [2] Visually grounded learning of keyword prediction from untranscribed speech [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3677 - 3681
- [3] Representations of language in a model of visually grounded speech signal [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 613 - 622
- [4] TRILINGUAL SEMANTIC EMBEDDINGS OF VISUALLY GROUNDED SPEECH WITH SELF-ATTENTION MECHANISMS [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4352 - 4356
- [5] VISION AS AN INTERLINGUA: LEARNING MULTILINGUAL SEMANTIC EMBEDDINGS OF UNTRANSCRIBED SPEECH [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4969 - 4973
- [6] SPEECH-TO-SPEECH TRANSLATION BETWEEN UNTRANSCRIBED UNKNOWN LANGUAGES [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 593 - 600
- [7] Learning to Recognise Words using Visually Grounded Speech [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
- [9] Pair Expansion for Learning Multilingual Semantic Embeddings using Disjoint Visually-grounded Speech Audio Datasets [J]. INTERSPEECH 2020, 2020, : 1486 - 1490