Unsupervised Automatic Speech Recognition: A review

被引:22
|
作者
Aldarmaki, Hanan [1 ]
Ullah, Asad [2 ]
Ram, Sreepratha [1 ]
Zaki, Nazar [1 ]
机构
[1] UAE Univ, Comp Sci & Software Engn Dept, Al Ain, U Arab Emirates
[2] Natl Univ Sci & Technol, Dept Comp Engn, Islamabad, Pakistan
关键词
Unsupervised ASR; Survey; Speech segmentation; Cross-modal mapping; LEARNING WORD EMBEDDINGS; PHONEME RECOGNITION; NEURAL-NETWORKS; SEGMENTATION; REPRESENTATIONS; FRAMEWORK; MODELS; ALGORITHMS; EFFICIENT; FEATURES;
D O I
10.1016/j.specom.2022.02.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of interest. In this paper, we review the research literature to identify models and ideas that could lead to fully unsupervised ASR, including unsupervised sub-word and word modeling, unsupervised segmentation of the speech signal, and unsupervised mapping from speech segments to text. The objective of the study is to identify the limitations of what can be learned from speech data alone and to understand the minimum requirements for speech recognition. Identifying these limitations would help optimize the resources and efforts in ASR development for low-resource languages.
引用
收藏
页码:76 / 91
页数:16
相关论文
共 50 条
  • [1] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
    Ni, Junrui
    Wang, Liming
    Gao, Heting
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    [J]. INTERSPEECH 2022, 2022, : 461 - 465
  • [2] Almost Unsupervised Text to Speech and Automatic Speech Recognition
    Ren, Yi
    Tan, Xu
    Qin, Tao
    Zhao, Sheng
    Zhao, Zhou
    Liu, Tie-Yan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [3] Automatic speech recognition: A review
    Haton, JP
    [J]. ENTERPRISE INFORMATION SYSTEMS V, 2004, : 6 - 11
  • [4] Automatic speech recognition and speech variability: A review
    Benzeghiba, M.
    De Mori, R.
    Deroo, O.
    Dupont, S.
    Erbes, T.
    Jouvet, D.
    Fissore, L.
    Laface, P.
    Mertins, A.
    Ris, C.
    Rose, R.
    Tyagi, V.
    Wellekens, C.
    [J]. SPEECH COMMUNICATION, 2007, 49 (10-11) : 763 - 786
  • [5] Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms
    Maganti, Hari Krishna
    Motlicek, Petr
    Gatica-Perez, Daniel
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1037 - +
  • [6] Unsupervised and active learning in automatic speech recognition for call classification
    Hakkani-Tür, D
    Tur, G
    Rahim, M
    Riccardi, G
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 429 - 432
  • [7] AN UNSUPERVISED VOCABULARY SELECTION TECHNIQUE FOR CHINESE AUTOMATIC SPEECH RECOGNITION
    Zhang, Yike
    Zhang, Pengyuan
    Li, Ta
    Yan, Yonghong
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 420 - 425
  • [8] Unsupervised Speech Recognition
    Baevski, Alexei
    Hsu, Wei-Ning
    Conneau, Alexis
    Auli, Michael
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [9] Automatic speech recognition fusion approach to unsupervised speaker clustering and labeling
    Lawson, A. D.
    Huggins, M. C.
    Grieco, J. J.
    Galligan, S. A.
    Harris, D. M.
    [J]. 2006 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2006, : 3280 - 3285
  • [10] UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION
    Eskimez, Sefik Emre
    Duan, Zhiyao
    Heinzelman, Wendi
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5099 - 5103