Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

被引:3
|
作者
Geng, Lei [1 ,2 ]
Shan, Hongfeng [2 ,3 ]
Xiao, Zhitao [1 ,2 ]
Wang, Wei [4 ,5 ,6 ,7 ,8 ]
Wei, Mei [4 ,5 ,6 ,7 ,8 ]
机构
[1] Tiangong Univ, Sch Life Sci, Tianjin 300387, Peoples R China
[2] Tianjin Key Lab Optoelect Detect Technol & Syst, Tianjin 300387, Peoples R China
[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[4] Tianjin First Cent Hosp, Dept Otorhinolaryngol Head & Neck Surg, Tianjin 300192, Peoples R China
[5] Inst Otolaryngol Tianjin, Tianjin, Peoples R China
[6] Key Lab Auditory Speech & Balance Med, Tianjin, Peoples R China
[7] Key Clin Discipline Tianjin Otolaryngol, Tianjin, Peoples R China
[8] Otolaryngol Clin Qual Control Ctr, Tianjin, Peoples R China
来源
关键词
EGG; LSTM; multi-modal; residual network; voice pathology detection;
D O I
10.1515/bmt-2021-0112
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Automatic voice pathology detection and clas-sification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectro-gram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolu-tional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.
引用
收藏
页码:613 / 625
页数:13
相关论文
共 50 条
  • [1] A Novel Seizure Detection Method Based on the Feature Fusion of Multimodal Physiological Signals
    Wu, Duanpo
    Wei, Jun
    Vidal, Pierre-Paul
    Wang, Danping
    Yuan, Yixuan
    Cao, Jiuwen
    Jiang, Tiejia
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (16): : 27545 - 27556
  • [2] Classification of imagined speech EEG signals based on feature fusion
    Zhang L.-W.
    Zhou Z.-D.
    Xu Y.-F.
    Wang J.-W.
    Ji W.-T.
    Song Z.-F.
    [J]. Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (04): : 726 - 734
  • [3] Multimodal emotion recognition for the fusion of speech and EEG signals
    Ma J.
    Sun Y.
    Zhang X.
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150
  • [4] A Novel Voice Sensor for the Detection of Speech Signals
    Wang, Kun-Ching
    [J]. SENSORS, 2013, 13 (12) : 16533 - 16550
  • [5] Classification-Based Detection of Glottal Closure Instants from Speech Signals
    Matousek, Jindrich
    Tihelka, Daniel
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3053 - 3057
  • [6] Detection of different voice diseases based on the nonlinear characterization of speech signals
    Travieso, Carlos M.
    Alonso, Jesus B.
    Orozco-Arroyave, J. R.
    Vargas-Bonilla, J. F.
    Noeth, E.
    Ravelo-Garcia, Antonio G.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 82 : 184 - 195
  • [7] Complex Networks: Application to Pathology Detection in Voice Signals
    Sebastian Hurtado-Jaramillo, Juan
    Guarin, Diego L.
    Orozco, Alvaro
    [J]. 2012 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2012, : 4229 - 4232
  • [8] Detection of Voice Pathology using Fractal Dimension in a Multiresolution Analysis of Normal and Disordered Speech Signals
    Ali, Zulfiqar
    Elamvazuthi, Irraivan
    Alsulaiman, Mansour
    Muhammad, Ghulam
    [J]. JOURNAL OF MEDICAL SYSTEMS, 2016, 40 (01) : 1 - 10
  • [9] Detection of Voice Pathology using Fractal Dimension in a Multiresolution Analysis of Normal and Disordered Speech Signals
    Zulfiqar Ali
    Irraivan Elamvazuthi
    Mansour Alsulaiman
    Ghulam Muhammad
    [J]. Journal of Medical Systems, 2016, 40
  • [10] Noise detection and classification in speech signals with boosting
    Miyake, Nobuyuki
    Takiguchi, Tetsuya
    Ariki, Yasuo
    [J]. 2007 IEEE/SP 14TH WORKSHOP ON STATISTICAL SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 778 - 782