Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

被引:3
|
作者
Afshan, Amber [1 ]
Guo, Jinxi [1 ]
Park, Soo Jin [1 ]
Ravi, Vijay [1 ]
McCree, Alan [2 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD USA
来源
关键词
automatic speaker verification; speaking style; data augmentation; multicondition training; SHORT UTTERANCES;
D O I
10.21437/Interspeech.2020-3006
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The effects of speaking-style variability on automatic speaker verification were investigated using the UCLA Speaker Variability database which comprises multiple speaking styles per speaker. An x-vector/PLDA (probabilistic linear discriminant analysis) system was trained with the SRE and Switchboard databases with standard augmentation techniques and evaluated with utterances from the UCLA database. The equal error rate (EER) was low when enrollment and test utterances were of the same style (e.g., 0.98% and 0.57% for read and conversational speech, respectively), but it increased substantially when styles were mismatched between enrollment and test utterances. For instance, when enrolled with conversation utterances, the EER increased to 3.03%, 2.96% and 22.12% when tested on read, narrative, and pet-directed speech, respectively. To reduce the effect of style mismatch, we propose an entropy-based variable frame rate technique to artificially generate style-normalized representations for PLDA adaptation. The proposed system significantly improved performance. In the aforementioned conditions, the EERs improved to 2.69% (conversation - read), 2.27% (conversation - narrative), and 18.75% (pet-directed read). Overall, the proposed technique performed comparably to multi-style PLDA adaptation without the need for training data in different speaking styles per speaker.
引用
收藏
页码:4318 / 4322
页数:5
相关论文
共 2 条
  • [1] Attention-based conditioning methods using variable frame rate for style-robust speaker verification
    Afshan, Amber
    Alwan, Abeer
    [J]. INTERSPEECH 2022, 2022, : 2333 - 2337
  • [2] A Variable Frame Length and Rate Algorithm based on the Spectral Kurtosis Measure for Speaker Verification
    Jung, Chi-Sang
    Han, Kyu J.
    Seo, Hyunson
    Narayanan, Shrikanth S.
    Kang, Hong-Goo
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2762 - +