A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection

被引:8
|
作者
Zhao, Denghuang [1 ]
Qiu, Zhixin [1 ]
Jiang, Yujie [1 ]
Zhu, Xincheng [1 ]
Zhang, Xiaojun [1 ]
Tao, Zhi [1 ]
机构
[1] Soochow Univ, 1 Shizi St, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Pathological voice detection; Deep learning; Interpretability; Depthwise separable CNN; CLASSIFICATION; INFORMATION; CEPSTRUM; VOWEL;
D O I
10.1016/j.bspc.2023.105624
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In recent years, deep learning methods in automatic pathological voice detection (APVD) have gained satisfying results. However, most deep learning methods in APVD cannot explain their performance. Interpretability is crucial in deep learning methods applied to the medical field. A lack of interpretability makes it hard for existing methods to give better generalization performance than meaningful feature-based methods in practical appli-cations. This paper proposed an interpretable neural network architecture called the Interpretable Multi-band Feature Extraction Network (IMBFN) based on clear feature extraction logic and a comprehensive result judg-ment method to improve the effectiveness and generalization performance of APVD. An amplitude-trainable SincNet (AT-SincNet) filter bank was put forward in IMBFN and applied as the front-end frequency division network. In addition, IMBFN used a designed two-path one-dimensional depthwise separatable convolutional neural network (CNN)-based feature extractor to extract meaningful voice features. The classification results of each voice frame were used to judge whether the voice was pathological synthetically. Comparative experiments were conducted using data from the MEEI, SVD, and HUPA databases. The best improvement of accuracy, F1-score, and Matthews correlation coefficient (MCC) reached 0.1705, 0.1977, and 0.4463, respectively. Also, blind tests were carried out in participants from the First Affiliated Hospital of Soochow University, and an accuracy, F1-score, and MCC of 0.7594, 0.8491, and 0.2981, respectively, were obtained. Results demonstrated that IMBFN provided meaningful explanations, good APVD effect, and better generalization performance than existing methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Voice Spoofing Detection Through Residual Network, Max Feature Map, and Depthwise Separable Convolution
    Kwak, Il-Youp
    Kwag, Sungsu
    Lee, Junhee
    Jeon, Youngbae
    Hwang, Jeonghwan
    Choi, Hyo-Jung
    Yang, Jong-Hoon
    Han, So-Yul
    Huh, Jun Ho
    Lee, Choong-Hoon
    Yoon, Ji Won
    IEEE ACCESS, 2023, 11 : 49140 - 49152
  • [2] A CNN-based automatic vulnerability detection
    Jung Hyun An
    Zhan Wang
    Inwhee Joe
    EURASIP Journal on Wireless Communications and Networking, 2023
  • [3] A CNN-based automatic vulnerability detection
    An, Jung Hyun
    Wang, Zhan
    Joe, Inwhee
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2023, 2023 (01)
  • [4] Efficient quantum feature extraction for CNN-based
    Dou, Tong
    Zhang, Guofeng
    Cui, Wei
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (11): : 7438 - 7456
  • [5] Unsupervised Feature Extraction - A CNN-Based Approach
    Trosten, Daniel J.
    Sharma, Puneet
    IMAGE ANALYSIS, 2019, 11482 : 197 - 208
  • [6] EEGWaveNet: Multiscale CNN-Based Spatiotemporal Feature Extraction for EEG Seizure Detection
    Thuwajit, Punnawish
    Rangpong, Phurin
    Sawangjai, Phattarapong
    Autthasan, Phairot
    Chaisaen, Rattanaphon
    Banluesombatkul, Nannapas
    Boonchit, Puttaranun
    Tatsaringkansakul, Nattasate
    Sudhawiyangkul, Thapanun
    Wilaiprasitporn, Theerawit
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (08) : 5547 - 5557
  • [7] Deen CNN-based feature extraction with optimised LSTM for enhanced diabetic retinopathy detection
    Bansode, Balbhim Narhari
    Bakwad, K. M.
    Dildar, Ajij Sayyad
    Sable, G. S.
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (03): : 960 - 975
  • [8] A Survey of CNN-Based Network Intrusion Detection
    Mohammadpour, Leila
    Ling, Teck Chaw
    Liew, Chee Sun
    Aryanfar, Alihossein
    APPLIED SCIENCES-BASEL, 2022, 12 (16):
  • [9] MSDFEN: Multi-scale dynamic feature extraction network for pathological voice detection
    Dai, Zhiyuan
    Jiang, Yuyang
    Cao, Laiyuan
    Zhang, Xiaojun
    Tao, Zhi
    APPLIED ACOUSTICS, 2025, 230
  • [10] CNN-Based Voice Emotion Classification Model for Risk Detection
    Yoo, Hyun
    Baek, Ji-Won
    Chung, Kyungyong
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2021, 29 (02): : 319 - 334