Computing nasalance with MFCCs and Convolutional Neural Networks

被引:0
|
作者
Lozano, Andres [1 ]
Nava, Enrique [1 ]
Garcia Mendez, Maria Dolores [2 ]
Moreno-Torres, Ignacio [2 ]
机构
[1] Univ Malaga, Dept Commun Engn, Malaga, Spain
[2] Univ Malaga, Dept Spanish Philol, Malaga, Spain
来源
PLOS ONE | 2024年 / 19卷 / 12期
关键词
SPEECH; RESONANCE; SCORES; HYPERNASALITY; RECOGNITION; NASALITY; CHILDREN; RATINGS;
D O I
10.1371/journal.pone.0315452
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Nasalance is a valuable clinical biomarker for hypernasality. It is computed as the ratio of acoustic energy emitted through the nose to the total energy emitted through the mouth and nose (eNasalance). A new approach is proposed to compute nasalance using Convolutional Neural Networks (CNNs) trained with Mel-Frequency Cepstrum Coefficients (mfccNasalance). mfccNasalance is evaluated by examining its accuracy: 1) when the train and test data are from the same or from different dialects; 2) with test data that differs in dynamicity (e.g. rapidly produced diadochokinetic syllables versus short words); and 3) using multiple CNN configurations (i.e. kernel shape and use of 1 x 1 pointwise convolution). Dual-channel Nasometer speech data from healthy speakers from different dialects: Costa Rica, more(+) nasal, Spain and Chile, less(-) nasal, are recorded. The input to the CNN models were sequences of 39 MFCC vectors computed from 250 ms moving windows. The test data were recorded in Spain and included short words (-dynamic), sentences (+dynamic), and diadochokinetic syllables (+dynamic). The accuracy of a CNN model was defined as the Spearman correlation between the mfccNasalance for that model and the perceptual nasality scores of human experts. In the same-dialect condition, mfccNasalance was more accurate than eNasalance independently of the CNN configuration; using a 1 x 1 kernel resulted in increased accuracy for +dynamic utterances (p < .000), though not for -dynamic utterances. The kernel shape had a significant impact for -dynamic utterances (p < .000) exclusively. In the different-dialect condition, the scores were significantly less accurate than in the same-dialect condition, particularly for Costa Rica trained models. We conclude that mfccNasalance is a flexible and useful alternative to eNasalance. Future studies should explore how to optimize mfccNasalance by selecting the most adequate CNN model as a function of the dynamicity of the target speech data.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Computing receptive fields of convolutional neural networks
    Araujo, André
    Norris, Wade
    Sim, Jack
    Distill, 2019, 4 (11):
  • [2] Fast Computing Framework for Convolutional Neural Networks
    Korytkowski, Marcin
    Staszewski, Pawel
    Woldan, Piotr
    Scherer, Rafal
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 118 - 123
  • [3] Hartley Stochastic Computing For Convolutional Neural Networks
    Mozafari, S. H.
    Clark, J. J.
    Gross, W. J.
    Meyer, B. H.
    2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), 2021, : 235 - 240
  • [4] Complexity of Deep Convolutional Neural Networks in Mobile Computing
    Naeem, Saad
    Jamil, Noreen
    Khan, Habib Ullah
    Nazir, Shah
    COMPLEXITY, 2020, 2020
  • [5] A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing
    Vestias, Mario P.
    ALGORITHMS, 2019, 12 (08)
  • [6] Scalable Stochastic-Computing Accelerator for Convolutional Neural Networks
    Sim, Hyeonuk
    Dong Nguyen
    Lee, Jongeun
    Choi, Kiyoung
    2017 22ND ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2017, : 696 - 701
  • [7] All-optical computing based on convolutional neural networks
    Kun Liao
    Ye Chen
    Zhongcheng Yu
    Xiaoyong Hu
    Xingyuan Wang
    Cuicui Lu
    Hongtao Lin
    Qingyang Du
    Juejun Hu
    Qihuang Gong
    Opto-ElectronicAdvances, 2021, 4 (11) : 50 - 58
  • [8] Poster: Scalable Quantum Convolutional Neural Networks for Edge Computing
    Wu, Jindi
    Li, Qun
    2022 IEEE/ACM 7TH SYMPOSIUM ON EDGE COMPUTING (SEC 2022), 2022, : 307 - 309
  • [9] Accurate and Efficient Stochastic Computing Hardware for Convolutional Neural Networks
    Yu, Joonsang
    Kim, Kyounghoon
    Lee, Jongeun
    Choi, Kiyoung
    2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2017, : 105 - 112
  • [10] All-optical computing based on convolutional neural networks
    Liao, Kun
    Chen, Ye
    Yu, Zhongcheng
    Hu, Xiaoyong
    Wang, Xingyuan
    Lu, Cuicui
    Lin, Hongtao
    Du, Qingyang
    Hu, Juejun
    Gong, Qihuang
    OPTO-ELECTRONIC ADVANCES, 2021, 4 (11)