Unsupervised Training of a DNN-based Formant Tracker

被引:2
|
作者
Lilley, Jason [1 ]
Bunnell, H. Timothy [1 ]
机构
[1] Nemours Biomed Res, Wilmington, DE 19803 USA
来源
关键词
speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;
D O I
10.21437/Interspeech.2021-1690
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.
引用
收藏
页码:1189 / 1193
页数:5
相关论文
共 50 条
  • [1] Unsupervised Domain Adaptation for DNN-based Automated Harvesting
    Shkanaev, Aleksandr Yu
    Sholomov, Dmitry L.
    Nikolaev, Dmitry P.
    TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [2] UNSUPERVISED SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5135 - 5139
  • [3] UNSUPERVISED CROSS-LINGUAL KNOWLEDGE TRANSFER IN DNN-BASED LVCSR
    Swietojanski, Pawel
    Ghoshal, Arnab
    Renals, Steve
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 246 - 251
  • [4] On the Training of DNN-based Average Voice Model for Speech Synthesis
    Yang, Shan
    Wu, Zhizheng
    Xie, Lei
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [5] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
  • [6] Resisting DNN-Based Website Fingerprinting Attacks Enhanced by Adversarial Training
    Qiao, Litao
    Wu, Bang
    Yin, Shuijun
    Li, Heng
    Yuan, Wei
    Luo, Xiapu
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 5375 - 5386
  • [7] Towards minimum perceptual error training for DNN-based speech synthesis
    Valentini-Botinhao, Cassia
    Wu, Zhizheng
    King, Simon
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 869 - 873
  • [8] DNN-Based Speech Synthesis: Importance of Input Features and Training Data
    Lazaridis, Alexandros
    Potard, Blaise
    Garner, Philip N.
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 193 - 200
  • [9] Exploring redundancy of HRTFs for fast training DNN-based HRTF personalization
    Chen, Tzu-Yu
    Hsiao, Po-Wen
    Chi, Tai-Shih
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1929 - 1933
  • [10] DNN-based interference mitigation beamformer
    Ramezanpour, Parham
    Mosavi, Mohammad Reza
    IET RADAR SONAR AND NAVIGATION, 2020, 14 (11): : 1788 - 1794