Enhancing Children's Short Utterance Based ASV Using Data Augmentation Techniques and Feature Concatenation Approach

被引:0
|
作者
Aziz, Shahid [1 ]
Shahnawazuddin, Syed [1 ]
机构
[1] Natl Inst Technol Patna, Patna 800005, Bihar, India
来源
关键词
Automatic speaker verification; In-domain data augmentation; Out-of-domain data augmentation; Mel-frequency cepstral coefficients; Inverse-Mel-frequency cepstral coefficients; Feature concatenation; SPEAKER VERIFICATION; LIMITED DATA; RECOGNITION; SPEECH; SYSTEM;
D O I
10.1007/978-3-031-48312-7_31
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The task of developing an automatic speaker verification (ASV) system for children's speech is a challenging one due to a number of reasons. The dearth of domain-specific data is one among them. The challenge further intensifies with the introduction of short utterances of speech, a relatively unexplored domain in the case of children's ASV. To circumvent the issue arising due to data scarcity, the work in this paper extensively explores various in-domain and out-of-domain data augmentation techniques. A data augmentation approach is proposed that encompasses both in-domain and out-of-domain data augmentation techniques. The out-of-domain data used are from adult speakers which are known to have acoustic attributes in stark contrast to child speakers. Consequently, various techniques like prosody modification, formant modification and voice-conversion are employed in order to modify the adult acoustic features and render it acoustically similar to children's speech prior to augmentation. The in-domain data augmentation approach, on the other hand, involved speed perturbation of children's speech. The proposed data augmentation approach helps not only in increasing the amount of training data but also in effectively capturing the missing target attributes which helps in boosting the verification performance. A relative improvement of 43.91% in equal error rate (EER) with respect to the baseline system is a testimony of it. Furthermore, the commonly used Mel-frequency cepstral coefficients (MFCC) average out the higher-frequency components due to the larger bandwidth of the filter-bank. Therefore, effective preservation of higher-frequency contents in children's speech is another challenge which must be appropriately tackled for the development of a reliable and robust children'stion techniques and Feature Concatenation A ASV system. The feature concatenation of MFCC and IMFCC is carried out with the sole intention of effectively preserving the higher-frequency contents in the children's speech data. The feature concatenation approach, when combined with proposed data augmentation, helps in further improvement of the verification performance and results in an overall relative reduction of 48.51% for equal error rate.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [1] Enhancing Children's Short Utterance-Based ASV Using Inverse Gamma-tone Filtered Cepstral coefficients
    Aziz, Shahid
    Shahnawazuddin, S.
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (05) : 3020 - 3041
  • [2] Enhancing Children’s Short Utterance-Based ASV Using Inverse Gamma-tone Filtered Cepstral coefficients
    Shahid Aziz
    S. Shahnawazuddin
    [J]. Circuits, Systems, and Signal Processing, 2024, 43 : 3020 - 3041
  • [3] A Smart and Secured Approach for Children's Health Monitoring Using Machine Learning Techniques Enhancing Data Privacy
    Revathi, K. P.
    Manikandan, T.
    [J]. IETE JOURNAL OF RESEARCH, 2023, 69 (03) : 1210 - 1221
  • [4] Enhancing feature extraction for VF detection using data mining techniques
    Rosado-Muñoz, A
    Camps-Valls, G
    Guerrero-Martínez, J
    Francés-Villora, JV
    Muñoz-Marí, J
    Serrano-López, AJ
    [J]. COMPUTERS IN CARDIOLOGY 2002, VOL 29, 2002, 29 : 209 - 212
  • [5] Feature Distribution-Based Medical Data Augmentation: Enhancing Mood Disorder Classification
    Yoo, Joo Hun
    An, Ji Hyun
    Chung, Tai-Myoung
    [J]. IEEE ACCESS, 2024, 12 : 127782 - 127791
  • [6] Improving Short Utterance based I-vector Speaker Recognition using Source and Utterance-Duration Normalization Techniques
    Kanagasundaram, A.
    Dean, D.
    Gonzalez-Dominguez, J.
    Sridharan, S.
    Ramos, D.
    Gonzalez-Rodriguez, J.
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2464 - 2468
  • [7] Enhancing Epileptic Seizure Detection Using Convolutional Neural Networks and Data Augmentation Techniques
    Pedram, Raha
    Farzanehkari, Pooyan
    Chaibakhsh, Ali
    [J]. 2023 30TH NATIONAL AND 8TH INTERNATIONAL IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING, ICBME, 2023, : 132 - 137
  • [8] Enhancing the Power of CNN Using Data Augmentation Techniques for Odia Handwritten Character Recognition
    Das, Mamatarani
    Panda, Mrutyunjaya
    Dash, Shreela
    [J]. ADVANCES IN MULTIMEDIA, 2022, 2022
  • [9] Enhancing a Deep Learning Model for the Steam Reforming Process Using Data Augmentation Techniques
    Pizon, Zofia
    Kimijima, Shinji
    Brus, Grzegorz
    [J]. ENERGIES, 2024, 17 (10)
  • [10] Enhancing Intrusion Detection Systems Using a Deep Learning and Data Augmentation Approach
    Mohammad, Rasheed
    Saeed, Faisal
    Almazroi, Abdulwahab Ali
    Alsubaei, Faisal S.
    Almazroi, Abdulaleem Ali
    [J]. SYSTEMS, 2024, 12 (03):