Computing nasalance with MFCCs and Convolutional Neural Networks

被引:0
|
作者
Lozano, Andres [1 ]
Nava, Enrique [1 ]
Garcia Mendez, Maria Dolores [2 ]
Moreno-Torres, Ignacio [2 ]
机构
[1] Univ Malaga, Dept Commun Engn, Malaga, Spain
[2] Univ Malaga, Dept Spanish Philol, Malaga, Spain
来源
PLOS ONE | 2024年 / 19卷 / 12期
关键词
SPEECH; RESONANCE; SCORES; HYPERNASALITY; RECOGNITION; NASALITY; CHILDREN; RATINGS;
D O I
10.1371/journal.pone.0315452
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Nasalance is a valuable clinical biomarker for hypernasality. It is computed as the ratio of acoustic energy emitted through the nose to the total energy emitted through the mouth and nose (eNasalance). A new approach is proposed to compute nasalance using Convolutional Neural Networks (CNNs) trained with Mel-Frequency Cepstrum Coefficients (mfccNasalance). mfccNasalance is evaluated by examining its accuracy: 1) when the train and test data are from the same or from different dialects; 2) with test data that differs in dynamicity (e.g. rapidly produced diadochokinetic syllables versus short words); and 3) using multiple CNN configurations (i.e. kernel shape and use of 1 x 1 pointwise convolution). Dual-channel Nasometer speech data from healthy speakers from different dialects: Costa Rica, more(+) nasal, Spain and Chile, less(-) nasal, are recorded. The input to the CNN models were sequences of 39 MFCC vectors computed from 250 ms moving windows. The test data were recorded in Spain and included short words (-dynamic), sentences (+dynamic), and diadochokinetic syllables (+dynamic). The accuracy of a CNN model was defined as the Spearman correlation between the mfccNasalance for that model and the perceptual nasality scores of human experts. In the same-dialect condition, mfccNasalance was more accurate than eNasalance independently of the CNN configuration; using a 1 x 1 kernel resulted in increased accuracy for +dynamic utterances (p < .000), though not for -dynamic utterances. The kernel shape had a significant impact for -dynamic utterances (p < .000) exclusively. In the different-dialect condition, the scores were significantly less accurate than in the same-dialect condition, particularly for Costa Rica trained models. We conclude that mfccNasalance is a flexible and useful alternative to eNasalance. Future studies should explore how to optimize mfccNasalance by selecting the most adequate CNN model as a function of the dynamicity of the target speech data.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Towards Edge Computing Using Early-Exit Convolutional Neural Networks
    Pacheco, Roberto G.
    Bochie, Kaylani
    Gilbert, Mateus S.
    Couto, Rodrigo S.
    Campista, Miguel Elias M.
    INFORMATION, 2021, 12 (10)
  • [32] Error Resilience Analysis for Systematically Employing Approximate Computing in Convolutional Neural Networks
    Hanif, Muhammad Abdullah
    Hafiz, Rechan
    Shafique, Muhammad
    PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 913 - 916
  • [33] Hybrid quantum-classical convolutional neural networks with privacy quantum computing
    Huang, Siwei
    Chang, Yan
    Lin, Yusheng
    Zhang, Shibin
    QUANTUM SCIENCE AND TECHNOLOGY, 2023, 8 (02)
  • [34] Normalization and dropout for stochastic computing-based deep convolutional neural networks
    Li, Ji
    Yuan, Zihao
    Li, Zhe
    Ren, Ao
    Ding, Caiwen
    Draper, Jeffrey
    Nazarian, Shahin
    Qiu, Qinru
    Yuan, Bo
    Wang, Yanzhi
    INTEGRATION-THE VLSI JOURNAL, 2019, 65 : 395 - 403
  • [35] Structural Design Optimization for Deep Convolutional Neural Networks using Stochastic Computing
    Li, Zhe
    Ren, Ao
    Li, Ji
    Qiu, Qinru
    Yuan, Bo
    Draper, Jeffrey
    Wang, Yanzhi
    PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 250 - 253
  • [36] Softmax Regression Design for Stochastic Computing Based Deep Convolutional Neural Networks
    Yuan, Zihao
    Li, Ji
    Li, Zhe
    Ding, Caiwen
    Ren, Ao
    Yuan, Bo
    Qiu, Qinru
    Draper, Jeffrey
    Wang, Yanzhi
    PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2017 (GLSVLSI' 17), 2017, : 467 - 470
  • [37] DIRECTION FINDING USING CONVOLUTIONAL NEURAL NETWORKS and CONVOLUTIONAL RECURRENT NEURAL NETWORKS
    Uckun, Fehmi Ayberk
    Ozer, Hakan
    Nurbas, Ekin
    Onat, Emrah
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [38] A Pattern Recognition System for Environmental Sound Classification based on MFCCs and Neural Networks
    Beritelli, F.
    Grasso, R.
    ICSPCS: 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, PROCEEDINGS, 2008, : 453 - 456
  • [39] Leveraging Quantum computing for synthetic image generation and recognition with Generative Adversarial Networks and Convolutional Neural Networks
    Golchha R.
    Verma G.K.
    International Journal of Information Technology, 2024, 16 (5) : 3149 - 3162
  • [40] COMPUTING WITH NEURAL NETWORKS
    PALM, G
    SCIENCE, 1987, 235 (4793) : 1227 - 1228