Multi-Scale Spatial and Temporal Speech Associations to Swallowing for Dysphagia Screening

被引:5
|
作者
He, Fei [1 ]
Hu, Xiaoyi [2 ,3 ]
Zhu, Ce [1 ]
Li, Ying [2 ,3 ]
Liu, Yipeng [1 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[2] Sichuan Univ, Ctr Gerontol & Geriatr, Natl Clin Res Ctr Geriatr, Chengdu 610041, Peoples R China
[3] Sichuan Univ, West China Hosp, Chengdu 610041, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Speech processing; Vibrations; Spectrogram; Trajectory; Pipelines; Hospitals; Dysphagia; multi-scale speech analysis; quantitative feature selection; spatial spectrogram contours; throat signal; AUTOMATIC DETECTION; VOICE; SCHIZOPHRENIA; DYSARTHRIA;
D O I
10.1109/TASLP.2022.3203235
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Dysphagia is a common symptom of many neurological diseases. It often occurs in older adults and increases the risk of aspiration pneumonia. Existing diagnosis systems of dysphagia are invasive or require patients to swallow liquids, which are costly and harmful to the patients. In this work, we propose an early screening system of dysphagia based on two kinds of throat signals, i.e., vowels and sentences. Based on the vowels, two new speech feature sets are developed: PET (pitch/energy trajectory) and FS-Conts (full spectrogram contours). The PET focuses on the prominent resonance energy of speech to track the pitch and energy fluctuations. It can reflect the stability of vocal cords in the speech generation process. The FS-Conts feature set is proposed to emphasize the spatial details of formants based on three-dimensional contours. Concerning the sentences, three categories of speech features are proposed, called LSSDL (log symmetric spectral difference level), C-coes (crucial energy coefficients), and LDF (local dynamic features). The three features explore the speech representations of dysphagia from global variations to local associations. The LSSDL highlights the global spectral differences in the interested frequency region. The C-coes and LDF locate local speech differences in specific frequency regions and time duration. In addition, a new feature selection algorithm is developed to search for distinguishing features. In the experiments, the SVM classifier is adopted and the dysphagia detection accuracy reaches 95.07%. The results of comparative experiments indicate that our system performs better than the existing methods.
引用
收藏
页码:2888 / 2899
页数:12
相关论文
共 50 条
  • [1] Multi-Scale Temporal Transformer For Speech Emotion Recognition
    Li, Zhipeng
    Xing, Xiaofen
    Fang, Yuanbo
    Zhang, Weibin
    Fan, Hengsheng
    Xu, Xiangmin
    INTERSPEECH 2023, 2023, : 3652 - 3656
  • [2] Multi-Scale Recurrence Analysis of Complex Temporal-Spatial System
    Deng Linhua
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 7383 - 7387
  • [3] Multi-Scale Spatial-Temporal Transformer for Meteorological Variable Forecasting
    Li, Tian-Bao
    Su, Yu-Ting
    Song, Dan
    Li, Wen-Hui
    Wei, Zhi-Qiang
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2474 - 2486
  • [4] Exposing Deepfake Videos with Spatial, Frequency and Multi-scale Temporal Artifacts
    Hu, Yongjian
    Zhao, Hongjie
    Yu, Zeqiong
    Liu, Beibei
    Yu, Xiangyu
    DIGITAL FORENSICS AND WATERMARKING, IWDW 2021, 2022, 13180 : 47 - 57
  • [5] Multi-scale spatial-temporal aware transformer for traffic prediction
    Tian, Ran
    Wang, Chu
    Hu, Jia
    Ma, Zhongyu
    INFORMATION SCIENCES, 2023, 648
  • [6] MSSTN: Multi-Scale Spatial Temporal Network for Air Pollution Prediction
    Wu, Zhiyuan
    Wang, Yue
    Zhang, Lin
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1547 - 1556
  • [7] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT
    Zhang, Guochang
    Yu, Libiao
    Wang, Chunliang
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9122 - 9126
  • [8] Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
    Zhang, Zehua
    Zhang, Lu
    Zhuang, Xuyi
    Qian, Yukun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [9] Multi-Scale Spatial-Temporal Transformer: A Novel Framework for Spatial-Temporal Edge Data Prediction
    Ming, Junhao
    Zhang, Dongmei
    Han, Wei
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [10] Multi-scale EMG classification with spatial-temporal attention for prosthetic hands
    Emimal, M.
    Hans, W. Jino
    Inbamalar, T. M.
    Lindsay, N. Mahiban
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2025, 28 (03) : 337 - 352