Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

被引:1
|
作者
Kataria, Saurabh [1 ,2 ]
Villalba, Jesus [1 ,2 ]
Moro-Velazquez, Laureano [1 ]
Dehak, Najim [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
关键词
domain adaptation; speech bandwidth extension; time-domain GAN; non-parallel learning; joint learning;
D O I
10.21437/Interspeech.2022-10900
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrow-band conversational telephone speech to wideband microphone speech. We developed parallel and non-parallel learning solutions which utilize both paired and unpaired data. We first discuss joint and disjoint training of multiple generative models for our tasks. Then, we propose a two-stage learning solution using a pre-trained domain adaptation system for pre-processing in bandwidth extension training. We evaluated our schemes on a Speaker Verification downstream task. We used the JHU-MIT experimental setup for NIST SRE21, which comprises SRE16, SRE-CTS Superset, and SRE21. Our results prove that learning both tasks is better than learning just one. On SRE16, our best system achieves 22% relative improvement in Equal Error Rate w.r.t. a direct learning baseline and 8% w.r.t. a strong bandwidth expansion system.
引用
收藏
页码:615 / 619
页数:5
相关论文
共 50 条
  • [21] UNSUPERVISED DOMAIN ADAPTATION OF NEURAL PLDA USING SEGMENT PAIRS FOR SPEAKER VERIFICATION
    Ulgen, I. Rasim
    Arslan, Levent M.
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 571 - 576
  • [22] An Extension of the Time-Domain Friis Equation
    Stumpf, Martin
    Vandenbosch, Guy A. E.
    2015 9th European Conference on Antennas and Propagation (EuCAP), 2015,
  • [23] Extension of time-domain matrix method
    Zeng, Jianping
    Lin, Du
    Lu, Zaide
    Zidonghua Xuebao/Acta Automatica Sinica, 1996, 22 (05): : 606 - 610
  • [24] DT-SV: A Transformer-based Time-domain Approach for Speaker Verification
    Zhang, Nan
    Wang, Jianzong
    Hong, Zhenhou
    Zhao, Chendong
    Qu, Xiaoyang
    Xiao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [25] Time-domain structural analysis of speech
    Ekstein, K
    Moucek, R
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 506 - 510
  • [26] Time-domain auditory processing of speech
    de Cheveigné, A
    JOURNAL OF PHONETICS, 2003, 31 (3-4) : 547 - 561
  • [27] AN ONLINE SPEAKER-AWARE SPEECH SEPARATION APPROACH BASED ON TIME-DOMAIN REPRESENTATION
    Wang, Hui
    Song, Yan
    Li, Zeng-Xi
    McLoughlin, Ian
    Dai, Li-Rong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6379 - 6383
  • [28] Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification
    Wang, Qiongqiong
    Koshinaka, Takafumi
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3727 - 3731
  • [29] Multi-Source Domain Adaptation and Fusion for Speaker Verification
    Zhu, Donghui
    Chen, Ning
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2103 - 2116
  • [30] TIME-DOMAIN BANDWIDTH-COMPRESSION SYSTEM
    STOVER, WR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (02): : 348 - &