Data Selection Based on Phoneme Affinity Matrix for Electrolarynx Speech Recognition

被引:0
|
作者
Hsieh, I-Ting [1 ]
Wu, Chung-Hsien [1 ]
Tsai, Shu-Wei [2 ]
机构
[1] Natl Cheng Kung Univ, Grad Program Multimedia Syst & Intelligent Comp, Tainan, Taiwan
[2] Natl Cheng Kung Univ Hosp, Dept Otolaryngol, Tainan, Taiwan
关键词
D O I
10.1109/APSIPAASC58517.2023.10317555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electrolarynx (EL) is a communicative aid for the patient after laryngectomy to generate communicable speech. Since EL speech exhibits low speech intelligibility and produces loud noise, understanding the content of the speech remains challenging for listeners, even if the patient is proficient in using the EL device. Accordingly, it is important to develop the tools that offer additional communication methods. Automatic speech recognition (ASR) of EL speech emerges as a method worth considering in this regard. However, the problem of underresourced data dramatically degrades the recognition performance of EL speech. Data augmentation is one of the viable solutions for addressing the issue of under-resourced speech data. However, even with an increased health training corpus, the improvement in EL speech recognition may not be satisfactory. Because the characteristics of the EL speech still differ significantly from those of health speech. This paper proposes a data selection method using the phoneme affinity matrix to prioritize the selection of health speech that closely resembles EL speech for data augmentation. The affinity between two phonemes is defined as the similarity of the Phone Posteriorgrams(PPGs) of the two phonemes, considering the phoneme models. The experimental results demonstrate that the approach utilizing data selection based on the phoneme affinity matrix yields superior results compared to both the baseline and the method employing random sampling to select the augmented health speech corpus.
引用
收藏
页码:2196 / 2202
页数:7
相关论文
共 50 条
  • [21] The Gamma MLP for speech phoneme recognition
    Lawrence, S
    Tsoi, AC
    Back, AD
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 785 - 791
  • [22] Enhancement of electrolarynx speech based on auditory masking
    Liu, HJ
    Zhao, Q
    Wan, MX
    Wang, SP
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2006, 53 (05) : 865 - 874
  • [23] REDUCTION OF WORD AND MINIMAL PHRASE CANDIDATES FOR SPEECH RECOGNITION BASED ON PHONEME RECOGNITION.
    Matsunaga, Sho-ichi
    Kohda, Masaki
    Systems and Computers in Japan, 1988, 19 (04) : 11 - 22
  • [24] Conversion from Phoneme Based to Grapheme Based Acoustic Models for Speech Recognition
    Zgank, Andrej
    Kacic, Zdravko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1587 - 1590
  • [25] SPEECH RECOGNITION BASED ON TOP-DOWN AND BOTTOM-UP PHONEME RECOGNITION
    MATSUNAGA, S
    SHIKANO, K
    REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1986, 34 (03): : 349 - 356
  • [26] Emotional feature extraction based on phoneme information for speech emotion recognition
    Hyun, Kyang Hak
    Kim, Eun Ho
    Kwak, Yoon Keun
    2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 797 - +
  • [27] SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition
    Nawroly, Sarkhell Sirwan
    Popescu, Decebal Gheorghe
    Antony, Mariya Celin Thekekara
    Philominal, Actlin Jeeva Muthu
    STUDIES IN INFORMATICS AND CONTROL, 2023, 32 (04): : 129 - 140
  • [28] Efficient data selection for speech recognition based on prior confidence estimation
    Kobashikawa, Satoshi
    Asami, Taichi
    Yamaguchi, Yoshikazu
    Masataki, Hirokazu
    Takahashi, Satoshi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2011, 32 (04) : 151 - 153
  • [29] A STOCHASTIC SEGMENT MODEL FOR PHONEME-BASED CONTINUOUS SPEECH RECOGNITION
    OSTENDORF, M
    ROUKOS, S
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (12): : 1857 - 1869
  • [30] Minimum Phoneme Error based filter bank analysis for speech recognition
    Huang, Hao
    Zhu, Jie
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1081 - +