Unsupervised Training of a DNN-based Formant Tracker

被引:2
|
作者
Lilley, Jason [1 ]
Bunnell, H. Timothy [1 ]
机构
[1] Nemours Biomed Res, Wilmington, DE 19803 USA
来源
关键词
speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;
D O I
10.21437/Interspeech.2021-1690
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.
引用
收藏
页码:1189 / 1193
页数:5
相关论文
共 50 条
  • [21] DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES
    Tachbelie, Martha Yifiru
    Abulimiti, Ayimunishagu
    Abate, Solomon Teferra
    Schultz, Tanja
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8269 - 8273
  • [22] DNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT SPEECH
    Zhao, Yan
    Wang, DeLiang
    Merks, Ivo
    Zhang, Tao
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6525 - 6529
  • [23] A KL Divergence and DNN-based Approach to Voice Conversion without Parallel Training Sentences
    Xie, Feng-Long
    Soong, Frank K.
    Li, Haifeng
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 287 - 291
  • [24] DNN-based speaker clustering for speaker diarisation
    Milner, Rosanna
    Hain, Thomas
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
  • [25] DNN-based Arabic Printed Characters Classification
    Amrouche, Aissa
    PROGRAM OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND AUTOMATIC CONTROL, ICEEAC 2024, 2024,
  • [26] A DNN-BASED ACOUSTIC MODELING OF TONAL LANGUAGE AND ITS APPLICATION TO MANDARIN PRONUNCIATION TRAINING
    Hu, Wenping
    Qian, Yao
    Soong, Frank K.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [27] DNN-based policies for stochastic AC OPF
    Gupta, Sarthak
    Misra, Sidhant
    Deka, Deepjyoti
    Kekatos, Vassilis
    ELECTRIC POWER SYSTEMS RESEARCH, 2022, 213
  • [28] DNN-Based RFID Antenna Tags Localization
    Patel, Sohel J.
    Zawodniok, Maciej
    2021 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE (I2MTC 2021), 2021,
  • [29] DNN-Based Radar Target Detection With OTFS
    Tan, Long
    Yuan, Weijie
    Zhang, Xiaoqi
    Zhang, Kecheng
    Li, Zhongjie
    Li, Yonghui
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (10) : 15786 - 15791
  • [30] Exploiting foreign resources for DNN-based ASR
    Petr Motlicek
    David Imseng
    Blaise Potard
    Philip N. Garner
    Ivan Himawan
    EURASIP Journal on Audio, Speech, and Music Processing, 2015