Data Selection Based on Phoneme Affinity Matrix for Electrolarynx Speech Recognition

被引:0
|
作者
Hsieh, I-Ting [1 ]
Wu, Chung-Hsien [1 ]
Tsai, Shu-Wei [2 ]
机构
[1] Natl Cheng Kung Univ, Grad Program Multimedia Syst & Intelligent Comp, Tainan, Taiwan
[2] Natl Cheng Kung Univ Hosp, Dept Otolaryngol, Tainan, Taiwan
关键词
D O I
10.1109/APSIPAASC58517.2023.10317555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electrolarynx (EL) is a communicative aid for the patient after laryngectomy to generate communicable speech. Since EL speech exhibits low speech intelligibility and produces loud noise, understanding the content of the speech remains challenging for listeners, even if the patient is proficient in using the EL device. Accordingly, it is important to develop the tools that offer additional communication methods. Automatic speech recognition (ASR) of EL speech emerges as a method worth considering in this regard. However, the problem of underresourced data dramatically degrades the recognition performance of EL speech. Data augmentation is one of the viable solutions for addressing the issue of under-resourced speech data. However, even with an increased health training corpus, the improvement in EL speech recognition may not be satisfactory. Because the characteristics of the EL speech still differ significantly from those of health speech. This paper proposes a data selection method using the phoneme affinity matrix to prioritize the selection of health speech that closely resembles EL speech for data augmentation. The affinity between two phonemes is defined as the similarity of the Phone Posteriorgrams(PPGs) of the two phonemes, considering the phoneme models. The experimental results demonstrate that the approach utilizing data selection based on the phoneme affinity matrix yields superior results compared to both the baseline and the method employing random sampling to select the augmented health speech corpus.
引用
收藏
页码:2196 / 2202
页数:7
相关论文
共 50 条
  • [31] Robust phoneme recognition for a speech therapy environment
    Grossinho, Andre
    Guimaraes, Isabel
    Magalhaes, Joao
    Cavaco, Sofia
    2016 IEEE INTERNATIONAL CONFERENCE ON SERIOUS GAMES AND APPLICATIONS FOR HEALTH, 2016,
  • [32] Phoneme recognition using speech image (spectrogram)
    Ahmadi, M
    Bailey, NJ
    Hoyle, BS
    ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 675 - 677
  • [33] CONTINUOUS PHONEME RECOGNITION IN CUED SPEECH FOR FRENCH
    Heracleous, Panikos
    Beautemps, Denis
    Hagita, Norihiro
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2090 - 2093
  • [34] Phoneme fuzzy characterization in speech recognition systems
    Beritelli, F
    Borrometi, L
    Cuce, A
    APPLICATIONS OF SOFT COMPUTING, 1997, 3165 : 305 - 306
  • [35] Phoneme Confusions in Human and Automatic Speech Recognition
    Meyer, Bernd T.
    Waechter, Matthias
    Brand, Thomas
    Kollmeier, Birger
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2740 - 2743
  • [36] Hierarchical Phoneme Classification for Improved Speech Recognition
    Oh, Donghoon
    Park, Jeong-Sik
    Kim, Ji-Hwan
    Jang, Gil-Jin
    APPLIED SCIENCES-BASEL, 2021, 11 (01): : 1 - 17
  • [37] Improving phoneme recognition of telephone quality speech
    Huang, Q
    Cox, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 445 - 448
  • [38] Phoneme and tonal accent recognition for Thai speech
    Theera-Umpon, Nipon
    Chansareewittaya, Suppakarn
    Auephanwiriyakul, Sansanee
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13254 - 13259
  • [39] A Comprehensive Examination of Phoneme Recognition in Automatic Speech Recognition Systems
    Bhatt, Shobha
    Bansal, Shweta
    Kumar, Ankit
    Pandey, Saroj Kumar
    Ojha, Manoj Kumar
    Singh, Kamred Udham
    Chakraborty, Sanjay
    Singh, Teekam
    Swarup, Chetan
    TRAITEMENT DU SIGNAL, 2023, 40 (05) : 1997 - 2008
  • [40] SPEECH RECOGNITION BASED ON TOP-DOWN AND BOTTOM-UP PHONEME RECOGNITION.
    Matsunaga, Sho-ichi
    Shikano, Kiyohiro
    Systems and Computers in Japan, 1986, 17 (07): : 95 - 106