Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

被引:1
|
作者
Kataria, Saurabh [1 ,2 ]
Villalba, Jesus [1 ,2 ]
Moro-Velazquez, Laureano [1 ]
Dehak, Najim [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
关键词
domain adaptation; speech bandwidth extension; time-domain GAN; non-parallel learning; joint learning;
D O I
10.21437/Interspeech.2022-10900
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrow-band conversational telephone speech to wideband microphone speech. We developed parallel and non-parallel learning solutions which utilize both paired and unpaired data. We first discuss joint and disjoint training of multiple generative models for our tasks. Then, we propose a two-stage learning solution using a pre-trained domain adaptation system for pre-processing in bandwidth extension training. We evaluated our schemes on a Speaker Verification downstream task. We used the JHU-MIT experimental setup for NIST SRE21, which comprises SRE16, SRE-CTS Superset, and SRE21. Our results prove that learning both tasks is better than learning just one. On SRE16, our best system achieves 22% relative improvement in Equal Error Rate w.r.t. a direct learning baseline and 8% w.r.t. a strong bandwidth expansion system.
引用
收藏
页码:615 / 619
页数:5
相关论文
共 50 条
  • [41] An Unsupervised Domain Adaptation Method Based on Distribution Alignment for Speaker Verification
    Gu, Qing
    Song, Yan
    Guo, Wu
    Ye, Zhongfu
    Dai, Lirong
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 359 - 369
  • [42] Deep domain adaptation for anti-spoofing in speaker verification systems
    Himawan, Ivan
    Villavicencio, Fernando
    Sridharan, Sridha
    Fookes, Clinton
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 377 - 402
  • [43] CDMA: CROSS-DOMAIN DISTANCE METRIC ADAPTATION FOR SPEAKER VERIFICATION
    Li, Jianchen
    Han, Jiqing
    Song, Hongwei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7197 - 7201
  • [44] Speech Bandwidth Extension Using Data Hiding Based on Discrete Hartley Transform Domain
    Yuya Hosoda
    Arata Kawamura
    Youji Iiguni
    Circuits, Systems, and Signal Processing, 2022, 41 : 2290 - 2307
  • [45] Speech Bandwidth Extension Using Data Hiding Based on Discrete Hartley Transform Domain
    Hosoda, Yuya
    Kawamura, Arata
    Iiguni, Youji
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (04) : 2290 - 2307
  • [46] A fast time-domain estimator of the bandwidth of a Gaussian process
    Szajnowski, WJ
    IEEE SIGNAL PROCESSING LETTERS, 1996, 3 (06) : 182 - 183
  • [47] Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
    Huang, Wen
    Han, Bing
    Chen, Zhengyang
    Wang, Shuai
    Qian, Yanmin
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 383 - 387
  • [48] Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
    Yang, Yufeng
    Pandey, Ashutosh
    Wang, DeLiang
    INTERSPEECH 2023, 2023, : 4913 - 4917
  • [49] SEGMENTATION OF SPEECH UTILIZING TIME-DOMAIN PROPERTIES OF SPEECH SIGNALS
    AKAMATSU, N
    NIKI, N
    TAKAHASHI, Y
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S179 - S179
  • [50] SE-Conformer: Time-Domain Speech Enhancement using Conformer
    Kim, Eesung
    Seo, Hyeji
    INTERSPEECH 2021, 2021, : 2736 - 2740