Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths

被引:2
|
作者
Liu, Jiajun [1 ,2 ]
Wumaier, Aishan [2 ,3 ]
Wei, Dongping [2 ,3 ]
Guo, Shen [2 ,3 ]
机构
[1] Xinjiang Univ, Coll Software, Urumqi 830046, Peoples R China
[2] Key Lab Multilingual Informat Technol Xinjiang Uyg, Urumqi 830046, Peoples R China
[3] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
关键词
speech disfluency detection; stuttering; limited data; wav2vec2.0; entropy invariance; CLASSIFICATION; DYSFLUENCIES;
D O I
10.3390/app13137579
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speech is critical for interpersonal communication, but not everyone has fluent communication skills. Speech disfluency, including stuttering and interruptions, affects not only emotional expression but also clarity of expression for people who stutter. Existing methods for detecting speech disfluency rely heavily on annotated data, which can be costly. Additionally, these methods have not considered the issue of variable-length disfluent speech, which limits the scalability of detection methods. To address these limitations, this paper proposes an automated method for detecting speech disfluency that can improve communication skills for individuals and assist therapists in tracking the progress of stuttering patients. The proposed method focuses on detecting four types of disfluency features using single-task detection and utilizes embeddings from the pre-trained wav2vec2.0 model, as well as convolutional neural network (CNN) and Transformer models for feature extraction. The model's scalability is improved by considering the issue of variable-length disfluent speech and modifying the model based on the entropy invariance of attention mechanisms. The proposed automated method for detecting speech disfluency has the potential to assist individuals in overcoming speech disfluency, improve their communication skills, and aid therapists in tracking the progress of stuttering patients. Additionally, the model's scalability across languages and lengths enhances its practical applicability. The experiments demonstrate that the model outperforms baseline models in both English and Chinese datasets, proving its universality and scalability in real-world applications.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Enhancing Stuttering Detection and Classification using Wav2Vec2.0
    Sen, Madhurima
    Das, Pradip K.
    2024 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP, 2024,
  • [2] Keyword spotting for dialectal speech and Introduction of wav2vec2.0
    Ariga, Tomohiro
    Minakawa, Reo
    Kojima, Kazunori
    Lee, Shi-Wook
    Itoh, Yoshiaki
    APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, 2024,
  • [3] Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper
    Kozhirbayev, Zhanibek
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (06) : 1382 - 1389
  • [4] Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
    Yi, Cheng
    Wang, Jianzong
    Cheng, Ning
    Zhou, Shiyu
    Xu, Bo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [5] Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0
    Kunesova, Marie
    Rezackova, Marketa
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 377 - 388
  • [6] Speech emotion recognition using fine-tuned Wav2vec2.0 and neural controlleddifferential equations classifier
    Wang, Ni
    Yang, Danyu
    PLOS ONE, 2025, 20 (02):
  • [7] A study on fine-tuning wav2vec2.0 Model for the task of Mispronunciation Detection and Diagnosis
    Peng, Linkai
    Fu, Kaiqi
    Lin, Binghuai
    Ke, Dengfeng
    Zhan, Jinsong
    INTERSPEECH 2021, 2021, : 4448 - 4452
  • [8] Damage localization method using ultrasonic lamb waves and Wav2Vec2.0 neural network
    Qian, Lubin
    Liu, Sihao
    Fan, Guopeng
    Liu, Xinlong
    Zhang, Hui
    Mei, Yaohua
    Xing, Yuhui
    Wang, Zhiqiang
    FRONTIERS IN MATERIALS, 2023, 10
  • [9] Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
    Stefanel Gris, Lucas Rafael
    Casanova, Edresson
    de Oliveira, Frederico Santos
    Soares, Anderson da Silva
    Candido Junior, Arnaldo
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 333 - 343
  • [10] The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
    Ge, Zirui
    Guo, Haiyan
    Wang, Tingting
    Yang, Zhen
    arXiv, 2023,