Data Selection Based on Phoneme Affinity Matrix for Electrolarynx Speech Recognition

被引:0
|
作者
Hsieh, I-Ting [1 ]
Wu, Chung-Hsien [1 ]
Tsai, Shu-Wei [2 ]
机构
[1] Natl Cheng Kung Univ, Grad Program Multimedia Syst & Intelligent Comp, Tainan, Taiwan
[2] Natl Cheng Kung Univ Hosp, Dept Otolaryngol, Tainan, Taiwan
关键词
D O I
10.1109/APSIPAASC58517.2023.10317555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electrolarynx (EL) is a communicative aid for the patient after laryngectomy to generate communicable speech. Since EL speech exhibits low speech intelligibility and produces loud noise, understanding the content of the speech remains challenging for listeners, even if the patient is proficient in using the EL device. Accordingly, it is important to develop the tools that offer additional communication methods. Automatic speech recognition (ASR) of EL speech emerges as a method worth considering in this regard. However, the problem of underresourced data dramatically degrades the recognition performance of EL speech. Data augmentation is one of the viable solutions for addressing the issue of under-resourced speech data. However, even with an increased health training corpus, the improvement in EL speech recognition may not be satisfactory. Because the characteristics of the EL speech still differ significantly from those of health speech. This paper proposes a data selection method using the phoneme affinity matrix to prioritize the selection of health speech that closely resembles EL speech for data augmentation. The affinity between two phonemes is defined as the similarity of the Phone Posteriorgrams(PPGs) of the two phonemes, considering the phoneme models. The experimental results demonstrate that the approach utilizing data selection based on the phoneme affinity matrix yields superior results compared to both the baseline and the method employing random sampling to select the augmented health speech corpus.
引用
收藏
页码:2196 / 2202
页数:7
相关论文
共 50 条
  • [1] PHONEME SELECTION FOR STUDIES IN AUTOMATIC SPEECH RECOGNITION
    SHOUP, JE
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1962, 34 (04): : 397 - &
  • [2] Feature Selection Using Game Theory for Phoneme Based Speech Recognition
    Rekha, J. Ujwala
    Chatrapati, K. Shahu
    Babu, A. Vinaya
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 962 - 966
  • [3] Mouth Shape Sequence Recognition Based on Speech Phoneme Recognition
    Xu, Ming
    Hu, Ruimin
    2006 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA, 2006,
  • [4] PHONEME GROUPING FOR SPEECH RECOGNITION
    REDDY, DR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 41 (05): : 1295 - &
  • [5] Data selection for speech recognition
    Wu, Yi
    Zhang, Rong
    Rudnicky, Alexander
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 562 - 565
  • [6] Improved Phoneme-Based Myoelectric Speech Recognition
    Zhou, Quan
    Jiang, Ning
    Englehart, Kevin
    Hudgins, Bernard
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (08) : 2016 - 2023
  • [7] Robust Phoneme Recognition Based on Biomimetic Speech Contours
    Carlin, Michael A.
    Patil, Kailash
    Nemala, Sridhar Krishna
    Elhilali, Mounya
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1346 - 1349
  • [8] Confusion analysis in phoneme based speech recognition in Hindi
    Bhatt, Shobha
    Dev, Amita
    Jain, Anurag
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (10) : 4213 - 4238
  • [9] Confusion analysis in phoneme based speech recognition in Hindi
    Bhatt, Shobha
    Dev, Amita
    Jain, Anurag
    Bhatt, Shobha (bhattsho@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (11): : 4213 - 4238
  • [10] Phoneme-grapheme based speech recognition system
    Magimai-Doss, M
    Stephenson, TA
    Bourlard, H
    Bengio, S
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 94 - 98