Multi-Scale Spatial and Temporal Speech Associations to Swallowing for Dysphagia Screening

被引：5

作者：

He, Fei ^{[1
]}

Hu, Xiaoyi ^{[2
,3
]}

Zhu, Ce ^{[1
]}

Li, Ying ^{[2
,3
]}

Liu, Yipeng ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China UESTC, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

[2] Sichuan Univ, Ctr Gerontol & Geriatr, Natl Clin Res Ctr Geriatr, Chengdu 610041, Peoples R China

[3] Sichuan Univ, West China Hosp, Chengdu 610041, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2022年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Speech processing; Vibrations; Spectrogram; Trajectory; Pipelines; Hospitals; Dysphagia; multi-scale speech analysis; quantitative feature selection; spatial spectrogram contours; throat signal; AUTOMATIC DETECTION; VOICE; SCHIZOPHRENIA; DYSARTHRIA;

D O I：

10.1109/TASLP.2022.3203235

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Dysphagia is a common symptom of many neurological diseases. It often occurs in older adults and increases the risk of aspiration pneumonia. Existing diagnosis systems of dysphagia are invasive or require patients to swallow liquids, which are costly and harmful to the patients. In this work, we propose an early screening system of dysphagia based on two kinds of throat signals, i.e., vowels and sentences. Based on the vowels, two new speech feature sets are developed: PET (pitch/energy trajectory) and FS-Conts (full spectrogram contours). The PET focuses on the prominent resonance energy of speech to track the pitch and energy fluctuations. It can reflect the stability of vocal cords in the speech generation process. The FS-Conts feature set is proposed to emphasize the spatial details of formants based on three-dimensional contours. Concerning the sentences, three categories of speech features are proposed, called LSSDL (log symmetric spectral difference level), C-coes (crucial energy coefficients), and LDF (local dynamic features). The three features explore the speech representations of dysphagia from global variations to local associations. The LSSDL highlights the global spectral differences in the interested frequency region. The C-coes and LDF locate local speech differences in specific frequency regions and time duration. In addition, a new feature selection algorithm is developed to search for distinguishing features. In the experiments, the SVM classifier is adopted and the dysphagia detection accuracy reaches 95.07%. The results of comparative experiments indicate that our system performs better than the existing methods.

引用

页码：2888 / 2899

页数：12

共 50 条

[1] Multi-Scale Temporal Transformer For Speech Emotion Recognition
Li, Zhipeng
Xing, Xiaofen
Fang, Yuanbo
Zhang, Weibin
Fan, Hengsheng
Xu, Xiangmin
INTERSPEECH 2023, 2023, : 3652 - 3656
[2] Multi-Scale Recurrence Analysis of Complex Temporal-Spatial System
Deng Linhua
2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 7383 - 7387
[3] Multi-Scale Spatial-Temporal Transformer for Meteorological Variable Forecasting
Li, Tian-Bao
Su, Yu-Ting
Song, Dan
Li, Wen-Hui
Wei, Zhi-Qiang
Liu, An-An
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2474 - 2486
[4] Exposing Deepfake Videos with Spatial, Frequency and Multi-scale Temporal Artifacts
Hu, Yongjian
Zhao, Hongjie
Yu, Zeqiong
Liu, Beibei
Yu, Xiangyu
DIGITAL FORENSICS AND WATERMARKING, IWDW 2021, 2022, 13180 : 47 - 57
[5] Multi-scale spatial-temporal aware transformer for traffic prediction
Tian, Ran
Wang, Chu
Hu, Jia
Ma, Zhongyu
INFORMATION SCIENCES, 2023, 648
[6] MSSTN: Multi-Scale Spatial Temporal Network for Air Pollution Prediction
Wu, Zhiyuan
Wang, Yue
Zhang, Lin
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1547 - 1556
[7] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT
Zhang, Guochang
Yu, Libiao
Wang, Chunliang
Wei, Jianqiang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9122 - 9126
[8] Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
Zhang, Zehua
Zhang, Lu
Zhuang, Xuyi
Qian, Yukun
Wang, Mingjiang
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
[9] Multi-Scale Spatial-Temporal Transformer: A Novel Framework for Spatial-Temporal Edge Data Prediction
Ming, Junhao
Zhang, Dongmei
Han, Wei
APPLIED SCIENCES-BASEL, 2023, 13 (17):
[10] Multi-scale EMG classification with spatial-temporal attention for prosthetic hands
Emimal, M.
Hans, W. Jino
Inbamalar, T. M.
Lindsay, N. Mahiban
COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2025, 28 (03) : 337 - 352

← 1 2 3 4 5 →