Enhancing Children's Short Utterance Based ASV Using Data Augmentation Techniques and Feature Concatenation Approach

被引：0

作者：

Aziz, Shahid ^{[1
]}

Shahnawazuddin, Syed ^{[1
]}

机构：

[1] Natl Inst Technol Patna, Patna 800005, Bihar, India

来源：

SPEECH AND COMPUTER, SPECOM 2023, PT II | 2023年 / 14339卷

关键词：

Automatic speaker verification; In-domain data augmentation; Out-of-domain data augmentation; Mel-frequency cepstral coefficients; Inverse-Mel-frequency cepstral coefficients; Feature concatenation; SPEAKER VERIFICATION; LIMITED DATA; RECOGNITION; SPEECH; SYSTEM;

D O I：

10.1007/978-3-031-48312-7_31

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The task of developing an automatic speaker verification (ASV) system for children's speech is a challenging one due to a number of reasons. The dearth of domain-specific data is one among them. The challenge further intensifies with the introduction of short utterances of speech, a relatively unexplored domain in the case of children's ASV. To circumvent the issue arising due to data scarcity, the work in this paper extensively explores various in-domain and out-of-domain data augmentation techniques. A data augmentation approach is proposed that encompasses both in-domain and out-of-domain data augmentation techniques. The out-of-domain data used are from adult speakers which are known to have acoustic attributes in stark contrast to child speakers. Consequently, various techniques like prosody modification, formant modification and voice-conversion are employed in order to modify the adult acoustic features and render it acoustically similar to children's speech prior to augmentation. The in-domain data augmentation approach, on the other hand, involved speed perturbation of children's speech. The proposed data augmentation approach helps not only in increasing the amount of training data but also in effectively capturing the missing target attributes which helps in boosting the verification performance. A relative improvement of 43.91% in equal error rate (EER) with respect to the baseline system is a testimony of it. Furthermore, the commonly used Mel-frequency cepstral coefficients (MFCC) average out the higher-frequency components due to the larger bandwidth of the filter-bank. Therefore, effective preservation of higher-frequency contents in children's speech is another challenge which must be appropriately tackled for the development of a reliable and robust children'stion techniques and Feature Concatenation A ASV system. The feature concatenation of MFCC and IMFCC is carried out with the sole intention of effectively preserving the higher-frequency contents in the children's speech data. The feature concatenation approach, when combined with proposed data augmentation, helps in further improvement of the verification performance and results in an overall relative reduction of 48.51% for equal error rate.

引用

页码：380 / 394

页数：15

共 50 条

[21] Enhancing aspect-based sentiment analysis using data augmentation based on back-translation
Taheri, Alireza
Zamanifar, Azadeh
Farhadi, Amirfarhad
[J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
[22] DATA AUGMENTATION BASED ON VOWEL STRETCH FOR IMPROVING CHILDREN'S SPEECH RECOGNITION
Nagano, Tohru
Fukuda, Takashi
Suzuki, Masayuki
Kurata, Gakuto
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 502 - 508
[23] Using Children's Books as an Approach to Enhancing Our Understanding of Disability
Pardeck, John T.
[J]. JOURNAL OF SOCIAL WORK IN DISABILITY & REHABILITATION, 2005, 4 (1-2) : 77 - 85
[24] A new hybrid approach for grapevine leaves recognition based on ESRGAN data augmentation and GASVM feature selection
Dogan, Guerkan
Imak, Andac
Ergen, Burhan
Sengur, Abdulkadir
[J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (14): : 7669 - 7683
[25] A new hybrid approach for grapevine leaves recognition based on ESRGAN data augmentation and GASVM feature selection
Gürkan Doğan
Andaç Imak
Burhan Ergen
Abdulkadir Sengur
[J]. Neural Computing and Applications, 2024, 36 : 7669 - 7683
[26] Bearing Prognostics: An Instance-Based Learning Approach with Feature Engineering, Data Augmentation, and Similarity Evaluation
Sun, Jun
Sun, Qiao
[J]. SIGNALS, 2021, 2 (04): : 662 - 687
[27] Enhancing Pavement Distress Detection Using a Morphological Constraints-Based Data Augmentation Method
Xu, Zhengchao
Dai, Zhe
Sun, Zhaoyun
Zuo, Chen
Song, Huansheng
Yuan, Changwei
[J]. COATINGS, 2023, 13 (04)
[28] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
Al-onazi, Badriyya B.
Nauman, Muhammad Asif
Jahangir, Rashid
Malik, Muhmmad Mohsin
Alkhammash, Eman H.
Elshewey, Ahmed M.
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
[29] Improving deep learning-based polyp detection using feature extraction and data augmentation
Chou, Yung-Chien
Chen, Chao-Chun
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16817 - 16837
[30] Improving deep learning-based polyp detection using feature extraction and data augmentation
Yung-Chien Chou
Chao-Chun Chen
[J]. Multimedia Tools and Applications, 2023, 82 : 16817 - 16837

← 1 2 3 4 5 →