Computing nasalance with MFCCs and Convolutional Neural Networks

被引:0
|
作者
Lozano, Andres [1 ]
Nava, Enrique [1 ]
Garcia Mendez, Maria Dolores [2 ]
Moreno-Torres, Ignacio [2 ]
机构
[1] Univ Malaga, Dept Commun Engn, Malaga, Spain
[2] Univ Malaga, Dept Spanish Philol, Malaga, Spain
来源
PLOS ONE | 2024年 / 19卷 / 12期
关键词
SPEECH; RESONANCE; SCORES; HYPERNASALITY; RECOGNITION; NASALITY; CHILDREN; RATINGS;
D O I
10.1371/journal.pone.0315452
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Nasalance is a valuable clinical biomarker for hypernasality. It is computed as the ratio of acoustic energy emitted through the nose to the total energy emitted through the mouth and nose (eNasalance). A new approach is proposed to compute nasalance using Convolutional Neural Networks (CNNs) trained with Mel-Frequency Cepstrum Coefficients (mfccNasalance). mfccNasalance is evaluated by examining its accuracy: 1) when the train and test data are from the same or from different dialects; 2) with test data that differs in dynamicity (e.g. rapidly produced diadochokinetic syllables versus short words); and 3) using multiple CNN configurations (i.e. kernel shape and use of 1 x 1 pointwise convolution). Dual-channel Nasometer speech data from healthy speakers from different dialects: Costa Rica, more(+) nasal, Spain and Chile, less(-) nasal, are recorded. The input to the CNN models were sequences of 39 MFCC vectors computed from 250 ms moving windows. The test data were recorded in Spain and included short words (-dynamic), sentences (+dynamic), and diadochokinetic syllables (+dynamic). The accuracy of a CNN model was defined as the Spearman correlation between the mfccNasalance for that model and the perceptual nasality scores of human experts. In the same-dialect condition, mfccNasalance was more accurate than eNasalance independently of the CNN configuration; using a 1 x 1 kernel resulted in increased accuracy for +dynamic utterances (p < .000), though not for -dynamic utterances. The kernel shape had a significant impact for -dynamic utterances (p < .000) exclusively. In the different-dialect condition, the scores were significantly less accurate than in the same-dialect condition, particularly for Costa Rica trained models. We conclude that mfccNasalance is a flexible and useful alternative to eNasalance. Future studies should explore how to optimize mfccNasalance by selecting the most adequate CNN model as a function of the dynamicity of the target speech data.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Reconfigurable spatial-parallel stochastic computing for accelerating sparse convolutional neural networks
    Zihan XIA
    Rui WAN
    Jienan CHEN
    Runsheng WANG
    ScienceChina(InformationSciences), 2023, 66 (06) : 267 - 286
  • [42] Learning Dynamic Computing Resource Allocation in Convolutional Neural Networks for Wireless Interference Identification
    Wang, Pengyu
    Cheng, Yufan
    Peng, Qihang
    Wang, Jun
    Li, Shaoqian
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (07) : 8770 - 8782
  • [43] Pruning deep convolutional neural networks for efficient edge computing in condition assessment of infrastructures
    Wu, Rih-Teng
    Singla, Ankush
    Jahanshahi, Mohammad R.
    Bertino, Elisa
    Ko, Bong Jun
    Verma, Dinesh
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2019, 34 (09) : 774 - 789
  • [44] A 12TOPS/W Computing-in-Memory Accelerator for Convolutional Neural Networks
    Fu, Jun-Hui
    Chang, Soon-Jyh
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 586 - 589
  • [45] Reconfigurable spatial-parallel stochastic computing for accelerating sparse convolutional neural networks
    Xia, Zihan
    Wan, Rui
    Chen, Jienan
    Wang, Runsheng
    SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (06)
  • [46] ACOUSTIC: Accelerating Convolutional Neural Networks through Or-Unipolar Skipped Stochastic Computing
    Romaszkan, Wojciech
    Li, Tianmu
    Melton, Tristan
    Pamarti, Sudhakar
    Gupta, Puneet
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 768 - 773
  • [47] Data-Efficient Adaptive Global Pruning for Convolutional Neural Networks in Edge Computing
    Gao, Zhipeng
    Sun, Shan
    Mo, Zijia
    Rui, Lanlan
    Yang, Yang
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 6633 - 6638
  • [48] Exploring the distributed learning on federated learning and cluster computing via convolutional neural networks
    Chang, Jia-Wei
    Hung, Jason C.
    Chu, Ting-Hong
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (05): : 2141 - 2153
  • [49] Exploring the distributed learning on federated learning and cluster computing via convolutional neural networks
    Jia-Wei Chang
    Jason C. Hung
    Ting-Hong Chu
    Neural Computing and Applications, 2024, 36 : 2141 - 2153
  • [50] In vitro convolutional neural networks
    William Poole
    Nature Machine Intelligence, 2022, 4 : 614 - 615