Multi-Scale Spatial and Temporal Speech Associations to Swallowing for Dysphagia Screening

被引：5

作者：

He, Fei ^{[1
]}

Hu, Xiaoyi ^{[2
,3
]}

Zhu, Ce ^{[1
]}

Li, Ying ^{[2
,3
]}

Liu, Yipeng ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China UESTC, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

[2] Sichuan Univ, Ctr Gerontol & Geriatr, Natl Clin Res Ctr Geriatr, Chengdu 610041, Peoples R China

[3] Sichuan Univ, West China Hosp, Chengdu 610041, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2022年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Speech processing; Vibrations; Spectrogram; Trajectory; Pipelines; Hospitals; Dysphagia; multi-scale speech analysis; quantitative feature selection; spatial spectrogram contours; throat signal; AUTOMATIC DETECTION; VOICE; SCHIZOPHRENIA; DYSARTHRIA;

D O I：

10.1109/TASLP.2022.3203235

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Dysphagia is a common symptom of many neurological diseases. It often occurs in older adults and increases the risk of aspiration pneumonia. Existing diagnosis systems of dysphagia are invasive or require patients to swallow liquids, which are costly and harmful to the patients. In this work, we propose an early screening system of dysphagia based on two kinds of throat signals, i.e., vowels and sentences. Based on the vowels, two new speech feature sets are developed: PET (pitch/energy trajectory) and FS-Conts (full spectrogram contours). The PET focuses on the prominent resonance energy of speech to track the pitch and energy fluctuations. It can reflect the stability of vocal cords in the speech generation process. The FS-Conts feature set is proposed to emphasize the spatial details of formants based on three-dimensional contours. Concerning the sentences, three categories of speech features are proposed, called LSSDL (log symmetric spectral difference level), C-coes (crucial energy coefficients), and LDF (local dynamic features). The three features explore the speech representations of dysphagia from global variations to local associations. The LSSDL highlights the global spectral differences in the interested frequency region. The C-coes and LDF locate local speech differences in specific frequency regions and time duration. In addition, a new feature selection algorithm is developed to search for distinguishing features. In the experiments, the SVM classifier is adopted and the dysphagia detection accuracy reaches 95.07%. The results of comparative experiments indicate that our system performs better than the existing methods.

引用

页码：2888 / 2899

页数：12

共 50 条

[31] MTSF: Multi-Scale Temporal-Spatial Fusion Network for Driver Attention Prediction
Jin, Lisheng
Ji, Bingdong
Guo, Baicang
Wang, Huanhuan
Han, Zhuotong
Liu, Xingchen
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (02) : 1494 - 1509
[32] Continuous Sign Language Recognition With Multi-Scale Spatial-Temporal Feature Enhancement
Wang, Zhen
Li, Dongyuan
Jiang, Renhe
Okumura, Manabu
IEEE Access, 13 : 5491 - 5506
[33] Gaitts: indoor gait recognition with multi-scale temporal-spatial information aggregation
Zhang, Langwen
Men, Zihan
Xie, Wei
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
[34] GaitASMS: gait recognition by adaptive structured spatial representation and multi-scale temporal aggregation
Yan Sun
Hu Long
Xueling Feng
Mark Nixon
Neural Computing and Applications, 2024, 36 : 7057 - 7069
[35] GaitASMS: gait recognition by adaptive structured spatial representation and multi-scale temporal aggregation
Sun, Yan
Long, Hu
Feng, Xueling
Nixon, Mark
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (13): : 7057 - 7069
[36] MFSTGN: a multi-scale spatial-temporal fusion graph network for traffic prediction
Ran Tian
Chu Wang
Jia Hu
Zhongyu Ma
Applied Intelligence, 2023, 53 : 22582 - 22601
[37] Spatial-temporal fraction map fusion with multi-scale remotely sensed images
Zhang, Yihang
Foody, Giles M.
Ling, Feng
Li, Xiaodong
Ge, Yong
Du, Yun
Atkinson, Peter M.
REMOTE SENSING OF ENVIRONMENT, 2018, 213 : 162 - 181
[38] Continuous Sign Language Recognition With Multi-Scale Spatial-Temporal Feature Enhancement
Wang, Zhen
Li, Dongyuan
Jiang, Renhe
Okumura, Manabu
IEEE ACCESS, 2025, 13 : 5491 - 5506
[39] Lead Pollution Remanence in an Urban River System: A multi-scale temporal and spatial study
Ayrault, S.
Le Pape, P.
Priadi, C. R.
Roy-Barman, M.
Quantin, C.
Bonte, P.
PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HEAVY METALS IN THE ENVIRONMENT, 2013, 1
[40] Spatial-temporal multi-scale interaction for few-shot video summarization
Li, Qun
Zhan, Zhuxi
Li, Yanchao
Bhanu, Bir
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 142

← 1 2 3 4 5 →