Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network

被引:18
|
作者
Fujimura, Shintaro [1 ]
Kojima, Tsuyoshi [2 ]
Okanoue, Yusuke [2 ]
Shoji, Kazuhiko [2 ]
Inoue, Masato [3 ]
Omori, Koichi [1 ]
Hori, Ryusuke [2 ]
机构
[1] Kyoto Univ, Grad Sch Med, Dept Otolaryngol Head & Neck Surg, Kyoto, Japan
[2] Tenri Hosp, Dept Otolaryngol, Tenri, Nara, Japan
[3] Waseda Univ, Sch Adv Sci & Engn, Dept Elect Engn & Biosci, Shinjuku Ku, Tokyo, Japan
关键词
Auditory perceptual voice analysis; GRBAS scale; Voice disorder; Deep learning; One-dimensional convolutional neural network; QUALITY; GRBAS; RELIABILITY; INDEX;
D O I
10.1016/j.jvoice.2020.02.009
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Objectives. Auditory-perceptual voice analysis is a standard method for quantifying pathological voice quality, but perceptual ratings are based on subjective evaluations and therefore may vary among examiners. Although many acoustic metrics have been studied for potential use in the objective evaluation of pathological voices, the interpretation of acoustic metrics in individual cases is difficult and the technique is not widely used by clinicians. The aim of this study was to establish standardized methods to discriminate grade, roughness, breathiness, asthenia, strain (GRBAS) scale scores of pathological voices directly using one-dimensional convolutional neural network (1D-CNN) models. Methods. We constructed an original dataset utilizing 1,377 voice samples of sustained phonation of the vowel /a/. Each voice sample was rated by three experts according to the GRBAS scale and the median values were used as the correct answer label. We designed an end-to-end 1D-CNN model with a raw voice waveform input having a frame width of 9,600 samples. The models were trained with our original dataset for each GRBAS category individually and the model performance was tested by the five-fold cross validation method. Results. The accuracy, F1 score, and quadratic weighted Cohen's kappa for the testing dataset were determined. The metrics for the G scale showed the most balanced model performance, with high accuracy (0.771) and substantial agreement (kappa = 0.710). The model for the R scale had relatively high accuracy (0.765) and F1 score (0.743) with moderate agreement (kappa = 0.536). The accuracy (0.883) and the F1 score (0.865) for the S scale were the highest among the five categories, whereas the Cohen's kappa was the lowest (0.190). Conclusions. The end-to-end 1D-CNN models can evaluate overall pathological voice quality with a reliability comparable to human evaluations. The efficiency with which the machine learning models can be trained and evaluated is closely related to the dataset quality.
引用
收藏
页码:15 / 20
页数:6
相关论文
共 50 条
  • [31] A Two-Phase Multilabel ECG Classification Using One-Dimensional Convolutional Neural Network and Modified Labels
    Antoni, L'ubomir
    Bruoth, Erik
    Bugata, Peter
    Bugata, Peter, Jr.
    Gajdos, David
    Horvat, Simon
    Hudak, David
    Kmecova, Vladimira
    Stana, Richard
    Stankova, Monika
    Szabari, Alexander
    Vozarikova, Gabriela
    2021 COMPUTING IN CARDIOLOGY (CINC), 2021,
  • [32] EEG Signals to Digit Classification Using Deep Learning-Based One-Dimensional Convolutional Neural Network
    Tiwari, Smita
    Goel, Shivani
    Bhardwaj, Arpit
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 9675 - 9691
  • [33] EEG Signals to Digit Classification Using Deep Learning-Based One-Dimensional Convolutional Neural Network
    Smita Tiwari
    Shivani Goel
    Arpit Bhardwaj
    Arabian Journal for Science and Engineering, 2023, 48 : 9675 - 9691
  • [34] Voice Disorders Classification Using Multilayer Neural Network
    Salhi, Lotfi
    Mourad, Talbi
    Cherif, Adnene
    SCS: 2008 2ND INTERNATIONAL CONFERENCE ON SIGNALS, CIRCUITS AND SYSTEMS, 2008, : 473 - 478
  • [35] Classification of Mycoplasma Pneumoniae Strains Based on One-Dimensional Convolutional Neural Network and Raman Spectroscopy
    Zhao Yong
    He Men-yuan
    Wang Bo-lin
    Zhao Rong
    Meng Zong
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42 (05) : 1439 - 1444
  • [36] MI Classification based on Locally Linear Embedding and One-dimensional Simplified Convolutional Neural Network
    Xu, Lei
    Wei, Li
    Xu, Youyun
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 340 - 345
  • [37] Damage localization and characterization using one-dimensional convolutional neural network and a sparse network of transducers
    Sattarifar, Afshin
    Nestorovi, Tamara
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115
  • [38] Classification of Cough Sounds Using Spectrogram Methods and a Parallel-Stream One-Dimensional Deep Convolutional Neural Network
    Huang, Yo-Ping
    Mushi, Richard
    IEEE ACCESS, 2022, 10 : 97089 - 97100
  • [39] Emotion recognition using support vector machine and one-dimensional convolutional neural network
    J. Sujanaa
    S. Palanivel
    M. Balasubramanian
    Multimedia Tools and Applications, 2021, 80 : 27171 - 27185
  • [40] Emotion recognition using support vector machine and one-dimensional convolutional neural network
    Sujanaa, J.
    Palanivel, S.
    Balasubramanian, M.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (18) : 27171 - 27185