Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

被引:1
|
作者
Kataria, Saurabh [1 ,2 ]
Villalba, Jesus [1 ,2 ]
Moro-Velazquez, Laureano [1 ]
Dehak, Najim [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
关键词
domain adaptation; speech bandwidth extension; time-domain GAN; non-parallel learning; joint learning;
D O I
10.21437/Interspeech.2022-10900
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrow-band conversational telephone speech to wideband microphone speech. We developed parallel and non-parallel learning solutions which utilize both paired and unpaired data. We first discuss joint and disjoint training of multiple generative models for our tasks. Then, we propose a two-stage learning solution using a pre-trained domain adaptation system for pre-processing in bandwidth extension training. We evaluated our schemes on a Speaker Verification downstream task. We used the JHU-MIT experimental setup for NIST SRE21, which comprises SRE16, SRE-CTS Superset, and SRE21. Our results prove that learning both tasks is better than learning just one. On SRE16, our best system achieves 22% relative improvement in Equal Error Rate w.r.t. a direct learning baseline and 8% w.r.t. a strong bandwidth expansion system.
引用
收藏
页码:615 / 619
页数:5
相关论文
共 50 条
  • [31] Cross-Domain adaptation in Distance Space for Speaker Verification
    Yi, Lu
    Mak, Man Wai
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2238 - 2243
  • [32] EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification
    Li, Jingyu
    Liu, Wei
    Lee, Tan
    INTERSPEECH 2022, 2022, : 3694 - 3698
  • [33] Bandwidth extension of narrowband speech in log spectra domain using neural network
    Pourmohammadi, Sara
    Vali, Mansour
    Ghadyani, Mohsen
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2015, 23 (02) : 433 - 446
  • [34] Cross-lingual speaker adaptation using domain adaptation and speaker consistency loss for text-to-speech synthesis
    Xin, Detai
    Saito, Yuki
    Takamichi, Shinnosuke
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3376 - 3380
  • [35] Cross-lingual Speaker Adaptation using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis
    Xin, Detai
    Saito, Yuki
    Takamichi, Shinnosuke
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    INTERSPEECH 2021, 2021, : 1614 - 1618
  • [36] Time-domain speech enhancement using generative adversarial networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    SPEECH COMMUNICATION, 2019, 114 : 10 - 21
  • [37] Supervised domain adaptation for text-independent speaker verification using limited data
    Sarfjoo, Seyyed Saeed
    Madikeri, Srikanth
    Motlicek, Petr
    Marcel, Sebastien
    INTERSPEECH 2020, 2020, : 3815 - 3819
  • [38] Improved Speech Enhancement using a Complex-Domain GAN with Fused Time-Domain and Time-frequency Domain Constraints
    Dang, Feng
    Zhang, Pengyuan
    Chen, Hangting
    INTERSPEECH 2021, 2021, : 2721 - 2725
  • [39] Visually Assisted Time-Domain Speech Enhancement
    Ideli, Elham
    Sharpe, Bruce
    Bajic, Ivan, V
    Vaughan, Rodney G.
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [40] PROCEDURE FOR TIME-DOMAIN SEGMENTATION OF CONNECTED SPEECH
    HESS, WJ
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 : S21 - S21