Unsupervised Training of a DNN-based Formant Tracker

被引:2
|
作者
Lilley, Jason [1 ]
Bunnell, H. Timothy [1 ]
机构
[1] Nemours Biomed Res, Wilmington, DE 19803 USA
来源
关键词
speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;
D O I
10.21437/Interspeech.2021-1690
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.
引用
收藏
页码:1189 / 1193
页数:5
相关论文
共 50 条
  • [31] DNN-BASED WIRELESS POSITIONING IN AN OUTDOOR ENVIRONMENT
    Lee, Jin-Young
    Eom, Chahyeon
    Kwak, Youngsu
    Kang, Hong-Goo
    Lee, Chungyong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3799 - 3803
  • [32] DNN-based Intelligent Beamforming on a Programmable Metasurface
    Li S.
    Fu S.
    Xu F.
    Journal of Radars, 2021, 10 (02) : 259 - 266
  • [33] DNN-based Direction Finding by Time Modulation
    Kim, Donghyun
    Kim, Sung Hoe
    Cha, Seung Gook
    Yoon, Young Joong
    Jang, Byung-Jun
    2020 IEEE INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION AND NORTH AMERICAN RADIO SCIENCE MEETING, 2020, : 439 - 440
  • [34] Exploiting foreign resources for DNN-based ASR
    Motlicek, Petr
    Imseng, David
    Potard, Blaise
    Garner, Philip N.
    Himawan, Ivan
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 10
  • [35] Attacking DNN-based Intrusion Detection Models
    Zhang, Xingwei
    Zheng, Xiaolong
    Wu, Desheng Dash
    IFAC PAPERSONLINE, 2020, 53 (05): : 415 - 419
  • [36] Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech
    Sone, Kentaro
    Nakashika, Toru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (08) : 1546 - 1553
  • [37] DNN-based QoT Estimation Using Topological Inputs and Training with Synthetic-Physical Data
    Mayer, Kayol S.
    dos Santos, Luan C. M.
    Pinto, Rossano P.
    Dal Maso, Marcos P. A.
    Rothenberg, Christian E.
    Arantes, Dalton S.
    Mello, Darli A. A.
    2023 IEEE PHOTONICS CONFERENCE, IPC, 2023,
  • [38] DNN-based Feature Enhancement using Joint Training Framework for Robust Multichannel Speech Recognition
    Lee, Kang Hyun
    Kang, Tae Gyoon
    Kang, Woo Hyun
    Kim, Nam Soo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3027 - 3031
  • [39] SOFT-TARGET TRAINING WITH AMBIGUOUS EMOTIONAL UTTERANCES FOR DNN-BASED SPEECH EMOTION CLASSIFICATION
    Ando, Atsushi
    Kobashikawa, Satoshi
    Kamiyama, Hosana
    Masumura, Ryo
    Ijima, Yusuke
    Aono, Yushi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4964 - 4968
  • [40] DNN-based feature enhancement using joint training framework for robust multichannel speech recognition
    Lee, Kang Hyun
    Kang, Tae Gyoon
    Kang, Woo Hyun
    Kim, Nam Soo
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2016, 08-12-September-2016 : 3027 - 3031