Multi-Scale Spatial and Temporal Speech Associations to Swallowing for Dysphagia Screening

被引:5
|
作者
He, Fei [1 ]
Hu, Xiaoyi [2 ,3 ]
Zhu, Ce [1 ]
Li, Ying [2 ,3 ]
Liu, Yipeng [1 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[2] Sichuan Univ, Ctr Gerontol & Geriatr, Natl Clin Res Ctr Geriatr, Chengdu 610041, Peoples R China
[3] Sichuan Univ, West China Hosp, Chengdu 610041, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Speech processing; Vibrations; Spectrogram; Trajectory; Pipelines; Hospitals; Dysphagia; multi-scale speech analysis; quantitative feature selection; spatial spectrogram contours; throat signal; AUTOMATIC DETECTION; VOICE; SCHIZOPHRENIA; DYSARTHRIA;
D O I
10.1109/TASLP.2022.3203235
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Dysphagia is a common symptom of many neurological diseases. It often occurs in older adults and increases the risk of aspiration pneumonia. Existing diagnosis systems of dysphagia are invasive or require patients to swallow liquids, which are costly and harmful to the patients. In this work, we propose an early screening system of dysphagia based on two kinds of throat signals, i.e., vowels and sentences. Based on the vowels, two new speech feature sets are developed: PET (pitch/energy trajectory) and FS-Conts (full spectrogram contours). The PET focuses on the prominent resonance energy of speech to track the pitch and energy fluctuations. It can reflect the stability of vocal cords in the speech generation process. The FS-Conts feature set is proposed to emphasize the spatial details of formants based on three-dimensional contours. Concerning the sentences, three categories of speech features are proposed, called LSSDL (log symmetric spectral difference level), C-coes (crucial energy coefficients), and LDF (local dynamic features). The three features explore the speech representations of dysphagia from global variations to local associations. The LSSDL highlights the global spectral differences in the interested frequency region. The C-coes and LDF locate local speech differences in specific frequency regions and time duration. In addition, a new feature selection algorithm is developed to search for distinguishing features. In the experiments, the SVM classifier is adopted and the dysphagia detection accuracy reaches 95.07%. The results of comparative experiments indicate that our system performs better than the existing methods.
引用
收藏
页码:2888 / 2899
页数:12
相关论文
共 50 条
  • [21] MULTI-SCALE SPATIAL-TEMPORAL NETWORK FOR PERSON RE-IDENTIFICATION
    Wang, Zhikang
    He, Lihuo
    Gao, Xinbo
    Huang, Yuanfei
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2052 - 2056
  • [22] Multi-scale scenarios of spatial-temporal dynamics in the European livestock sector
    Neumann, Kathleen
    Verburg, Peter H.
    Elbersen, Berien
    Stehfest, Elke
    Woltjer, Geert B.
    AGRICULTURE ECOSYSTEMS & ENVIRONMENT, 2011, 140 (1-2) : 88 - 101
  • [23] Multi-scale cross-correlation analysis of temporal and spatial seismic data
    Min Lin
    Jiaxin Qin
    Gang Wang
    The European Physical Journal B, 2020, 93
  • [24] MTTPRE: A Multi-Scale Spatial-Temporal Model for Travel Time Prediction
    Wan, Feng
    Li, Linsen
    Wang, Ke
    Chen, Lu
    Gao, Yunjun
    Jiang, Weihao
    Pu, Shiliang
    30TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, ACM SIGSPATIAL GIS 2022, 2022, : 384 - 393
  • [25] Multi-scale temporal variability in biological-physical associations in the NE Chukchi Sea
    Gonzalez, Silvana
    Horne, John K.
    Danielson, Seth L.
    POLAR BIOLOGY, 2021, 44 (04) : 837 - 855
  • [26] Multi-scale temporal variability in biological-physical associations in the NE Chukchi Sea
    Silvana Gonzalez
    John K. Horne
    Seth L. Danielson
    Polar Biology, 2021, 44 : 837 - 855
  • [27] Multi-Scale TCN: Exploring Better Temporal DNN Model for Causal Speech Enhancement
    Zhang, Lu
    Wang, Mingjiang
    INTERSPEECH 2020, 2020, : 2672 - 2676
  • [28] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Zhang, Guochang
    Wang, Chunliang
    Yu, Libiao
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9206 - 9210
  • [29] Spatial Index Technology for Multi-scale and Large Scale Spatial Data
    Liu, Yuanyuan
    Liu, Gang
    He, Zhenwen
    2010 18TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS, 2010,
  • [30] Multi-Scale Spatial-Temporal Integration Convolutional Tube for Human Action Recognition
    Wu, Haoze
    Liu, Jiawei
    Zhu, Xierong
    Wang, Meng
    Zha, Zheng-Jun
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 753 - 759