Automatic Voice Disorder Detection Using Self-Supervised Representations

被引：7

作者：

Ribas, Dayana ^{[1
]}

Pastor, Miguel A. ^{[1
]}

Miguel, Antonio ^{[1
]}

Martinez, David

Ortega, Alfonso ^{[1
,2
]}

Lleida, Eduardo ^{[1
]}

机构：

[1] Univ Zaragoza, Aragon Inst Engn Res I3A, ViVoLab, Zaragoza 50018, Spain

[2] Lumenvox, D-81379 Munich, Germany

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

欧盟地平线“2020”;

关键词：

Voice disorder; pathological speech; Saarbruecken voice database; advanced voice function assessment database; self-supervised; class token; transformer; deep neural networks; PATHOLOGY DETECTION; PREVALENCE;

D O I：

10.1109/ACCESS.2023.3243986

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many speech features and models, including Deep Neural Networks (DNN), are used for classification tasks between healthy and pathological speech with the Saarbruecken Voice Database (SVD). However, accuracy values of 80.71% for phrases or 82.8% for vowels /aiu/ are the highest reported for audio samples in SVD when the evaluation includes the wide amount of pathologies in the database, instead of a selection of some pathologies. This paper targets this top performance in the state-of-the-art Automatic Voice Disorder Detection (AVDD) systems. In the framework of a DNN-based AVDD system we study the capability of Self-Supervised (SS) representation learning for describing discriminative cues between healthy and pathological speech. The system processes the SS temporal sequence of features with a single feed-forward layer and Class-Token (CT) Transformer for obtaining the classification between healthy and pathological speech. Furthermore, there is evaluated a suitable data extension of the training set with out-ofdomain data is also evaluated to deal with the low availability of data for using DNN-based models in voice pathology detection. Experimental results using audio samples corresponding to phrases in the SVD dataset, including all pathologies available, show classification accuracy values until 93.36%. This means that the proposed AVDD system achieved accuracy improvements of 4.1% without the training data extension, and 15.62% after the training data extension compared to the baseline system. Beyond the novelty of using SS representations for AVDD, the fact of obtaining accuracies over 90% in these conditions and using the whole set of pathologies in the SVD is a milestone for voice disorder-related research. Furthermore, the study on the amount of in-domain data in the training set related to the system performance show guidance for the data preparation stage. Lessons learned in this work suggest guidelines for taking advantage of DNN, to boost the performance in developing automatic systems for diagnosis, treatment, and monitoring of voice pathologies.

引用

页码：14915 / 14927

页数：13

共 50 条

[1] Exploring the use of self-supervised representations for automatic syllable stress detection
Mallela, Jhansi
Aluru, Sai Harshitha
Yarra, Chiranjeevi
2024 NATIONAL CONFERENCE ON COMMUNICATIONS, NCC, 2024,
[2] Adversarial Continual Learning to Transfer Self-Supervised Speech Representations for Voice Pathology Detection
Park, Dongkeon
Yu, Yechan
Katabi, Dina
Kim, Hong Kook
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 (932-936) : 932 - 936
[3] Stuttering detection using speaker representations and self-supervised contextual embeddings
Sheikh S.A.
Sahidullah M.
Hirsch F.
Ouni S.
International Journal of Speech Technology, 2023, 26 (02) : 521 - 530
[4] Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT
Li, Lanting
Lu, Tianliang
Ma, Xingbang
Yuan, Mengjiao
Wan, Da
APPLIED SCIENCES-BASEL, 2023, 13 (14):
[5] Self-supervised graph representations of WSIs
Pina, Oscar
Vilaplana, Veronica
GEOMETRIC DEEP LEARNING IN MEDICAL IMAGE ANALYSIS, VOL 194, 2022, 194 : 107 - 117
[6] A study of the generalizability of self-supervised representations
Tendle, Atharva
Hasan, Mohammad Rashedul
MACHINE LEARNING WITH APPLICATIONS, 2021, 6
[7] UNIVERSAL PARALINGUISTIC SPEECH REPRESENTATIONS USING SELF-SUPERVISED CONFORMERS
Shor, Joel
Jansen, Aren
Han, Wei
Park, Daniel
Zhang, Yu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3169 - 3173
[8] Experimental Case Study of Self-Supervised Learning for Voice Spoofing Detection
Lee, Yerin
Kim, Narin
Jeong, Jaehong
Kwak, Il-Youp
IEEE ACCESS, 2023, 11 : 24216 - 24226
[9] ANY-TO-ONE SEQUENCE-TO-SEQUENCE VOICE CONVERSION USING SELF-SUPERVISED DISCRETE SPEECH REPRESENTATIONS
Huang, Wen-Chin
Wu, Yi-Chiao
Hayashi, Tomoki
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5944 - 5948
[10] Non-Parallel Voice Conversion Using Cycle-Consistent Adversarial Networks with Self-Supervised Representations
Chun, Chanjun
Lee, Young Han
Lee, Geon Woo
Jeon, Moongu
Kim, Hong Kook
2023 IEEE 20TH CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE, CCNC, 2023,

← 1 2 3 4 5 →