End-to-end deep learning classification of vocal pathology using stacked vowels

被引:2
|
作者
Liu, George S. [1 ,2 ]
Hodges, Jordan M. [3 ]
Yu, Jingzhi [4 ]
Sung, C. Kwang [1 ,2 ]
Erickson-DiRenzo, Elizabeth [1 ,2 ]
Doyle, Philip C. [1 ,2 ,5 ]
机构
[1] Stanford Univ, Dept Otolaryngol Head & Neck Surg, Stanford Sch Med, Stanford, CA 94305 USA
[2] Stanford Univ, Sch Med, Div Laryngol, Stanford, CA 94305 USA
[3] Stanford Univ, Sch Engn, Comp Sci Dept, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Biomed Data Sci, Biomed Informat, Sch Med, Stanford, CA 94305 USA
[5] Stanford Univ, Sch Med, Div Laryngol, Otolaryngol Head & Neck Surg, 801 Welch Rd, Stanford, CA 94035 USA
来源
关键词
artificial intelligence; deep learning; voice classification; voice disorders; voice pathology; NEURAL-NETWORKS; VOICE QUALITY; FRAMEWORK; DATABASE;
D O I
10.1002/lio2.1144
中图分类号
R76 [耳鼻咽喉科学];
学科分类号
100213 ;
摘要
Objectives: Advances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology.Methods: Voice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1-dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously.Results: For multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class-specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively).Conclusions: This study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI-driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach.
引用
收藏
页码:1312 / 1318
页数:7
相关论文
共 50 条
  • [1] Automated Classification Using End-to-End Deep Learning
    Jaipurkar, Shobhit Sandeep
    Jie, Wang
    Zeng, Zeng
    Gee, Teo Sin
    Veeravalli, Bharadwaj
    Chua, Matthew
    [J]. 2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 706 - 709
  • [2] Classification of ALS Point Clouds Using End-to-End Deep Learning
    Winiwarter, Lukas
    Mandiburger, Gottfried
    Schmohl, Stefan
    Pfeifer, Norbert
    [J]. PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE, 2019, 87 (03): : 75 - 90
  • [3] Classification of ALS Point Clouds Using End-to-End Deep Learning
    Lukas Winiwarter
    Gottfried Mandlburger
    Stefan Schmohl
    Norbert Pfeifer
    [J]. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 2019, 87 : 75 - 90
  • [4] An End-to-End Deep Learning Architecture for Graph Classification
    Zhang, Muhan
    Cui, Zhicheng
    Neumann, Marion
    Chen, Yixin
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4438 - 4445
  • [5] End-to-end Multimodel Deep Learning for Malware Classification
    Snow, Elijah
    Alam, Mahbubul
    Glandon, Alexander
    Iftekharuddin, Khan
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] An End-to-End Deep Learning System for Hop Classification
    Castro, Pedro
    Moreira, Gladston
    Luz, Eduardo
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2022, 20 (03) : 430 - 442
  • [7] An End-to-End Deep Learning Method for Voltage Sag Classification
    Turovic, Radovan
    Dragan, Dinu
    Gojic, Gorana
    Petrovic, Veljko B.
    Gajic, Dusan B.
    Stanisavljevic, Aleksandar M.
    Katic, Vladimir A.
    [J]. ENERGIES, 2022, 15 (08)
  • [8] An efficient end-to-end deep learning architecture for activity classification
    Amel Ben Mahjoub
    Mohamed Atri
    [J]. Analog Integrated Circuits and Signal Processing, 2019, 99 : 23 - 32
  • [9] An end-to-end deep learning approach for Raman spectroscopy classification
    Zhou, Mengfei
    Hu, Yinchao
    Wang, Ruizhen
    Guo, Tian
    Yu, Qiqing
    Xia, Luyue
    Sun, Xiaofang
    [J]. JOURNAL OF CHEMOMETRICS, 2023, 37 (02)
  • [10] An efficient end-to-end deep learning architecture for activity classification
    Ben Mahjoub, Amel
    Atri, Mohamed
    [J]. ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2019, 99 (01) : 23 - 32