Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

被引:1
|
作者
Kataria, Saurabh [1 ,2 ]
Villalba, Jesus [1 ,2 ]
Moro-Velazquez, Laureano [1 ]
Dehak, Najim [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
关键词
domain adaptation; speech bandwidth extension; time-domain GAN; non-parallel learning; joint learning;
D O I
10.21437/Interspeech.2022-10900
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrow-band conversational telephone speech to wideband microphone speech. We developed parallel and non-parallel learning solutions which utilize both paired and unpaired data. We first discuss joint and disjoint training of multiple generative models for our tasks. Then, we propose a two-stage learning solution using a pre-trained domain adaptation system for pre-processing in bandwidth extension training. We evaluated our schemes on a Speaker Verification downstream task. We used the JHU-MIT experimental setup for NIST SRE21, which comprises SRE16, SRE-CTS Superset, and SRE21. Our results prove that learning both tasks is better than learning just one. On SRE16, our best system achieves 22% relative improvement in Equal Error Rate w.r.t. a direct learning baseline and 8% w.r.t. a strong bandwidth expansion system.
引用
收藏
页码:615 / 619
页数:5
相关论文
共 50 条
  • [1] TIME-DOMAIN NEURAL NETWORK APPROACH FOR SPEECH BANDWIDTH EXTENSION
    Hao, Xiang
    Xu, Chenglin
    Hou, Nana
    Xie, Lei
    Chng, Eng Siong
    Li, Haizhou
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 866 - 870
  • [2] TIME-DOMAIN SPEAKER VERIFICATION USING TEMPORAL CONVOLUTIONAL NETWORKS
    Han, Sangwook
    Byun, Jaeuk
    Shin, Jong Won
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6688 - 6692
  • [3] Time-Domain Speech Super-Resolution With GAN Based Modeling for Telephony Speaker Verification
    Kataria, Saurabh
    Villalba, Jesus
    Moro-Velazquez, Laureano
    Zelasko, Piotr
    Dehak, Najim
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1736 - 1749
  • [4] Domain Adaptation for Text Dependent Speaker Verification
    Aronowitz, Hagai
    Rendel, Asaf
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1337 - 1341
  • [5] IMPROVING SPEAKER DISCRIMINATION OF TARGET SPEECH EXTRACTION WITH TIME-DOMAIN SPEAKERBEAM
    Delcroix, Marc
    Ochiai, Tsubasa
    Zmolikova, Katerina
    Kinoshita, Keisuke
    Tawara, Naohiro
    Nakatani, Tomohiro
    Araki, Shoko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 691 - 695
  • [6] DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION
    Zhao, Yong
    Li, Jinyu
    Zhang, Shixiong
    Chen, Liping
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5984 - 5988
  • [7] LOW-RESOURCE DOMAIN ADAPTATION FOR SPEAKER RECOGNITION USING CYCLE-GANS
    Nidadavolu, Phani Sankar
    Kataria, Saurabh
    Villalba, Jesus
    Dehak, Najim
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 710 - 717
  • [8] TIME-DOMAIN SPEAKER EXTRACTION NETWORK
    Xu, Chenglin
    Rao, Wei
    Chng, Eng Siong
    Li, Haizhou
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 327 - 334
  • [9] TIME-DOMAIN ALGORITHMS FOR HARMONIC BANDWIDTH REDUCTION AND TIME SCALING OF SPEECH SIGNALS
    MALAH, D
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 121 - 133
  • [10] Transform-Domain Speech Bandwidth Extension
    Nizampatnam, Prasad
    Raju, G. R. L. V. N. S.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (12) : 5717 - 5733