Unsupervised Training of a DNN-based Formant Tracker

被引：2

作者：

Lilley, Jason ^{[1
]}

Bunnell, H. Timothy ^{[1
]}

机构：

[1] Nemours Biomed Res, Wilmington, DE 19803 USA

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;

D O I：

10.21437/Interspeech.2021-1690

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.

引用

页码：1189 / 1193

页数：5

共 50 条

[21] DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES
Tachbelie, Martha Yifiru
Abulimiti, Ayimunishagu
Abate, Solomon Teferra
Schultz, Tanja
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8269 - 8273
[22] DNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT SPEECH
Zhao, Yan
Wang, DeLiang
Merks, Ivo
Zhang, Tao
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6525 - 6529
[23] A KL Divergence and DNN-based Approach to Voice Conversion without Parallel Training Sentences
Xie, Feng-Long
Soong, Frank K.
Li, Haifeng
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 287 - 291
[24] DNN-based speaker clustering for speaker diarisation
Milner, Rosanna
Hain, Thomas
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
[25] DNN-based Arabic Printed Characters Classification
Amrouche, Aissa
PROGRAM OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND AUTOMATIC CONTROL, ICEEAC 2024, 2024,
[26] A DNN-BASED ACOUSTIC MODELING OF TONAL LANGUAGE AND ITS APPLICATION TO MANDARIN PRONUNCIATION TRAINING
Hu, Wenping
Qian, Yao
Soong, Frank K.
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[27] DNN-based policies for stochastic AC OPF
Gupta, Sarthak
Misra, Sidhant
Deka, Deepjyoti
Kekatos, Vassilis
ELECTRIC POWER SYSTEMS RESEARCH, 2022, 213
[28] DNN-Based RFID Antenna Tags Localization
Patel, Sohel J.
Zawodniok, Maciej
2021 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE (I2MTC 2021), 2021,
[29] DNN-Based Radar Target Detection With OTFS
Tan, Long
Yuan, Weijie
Zhang, Xiaoqi
Zhang, Kecheng
Li, Zhongjie
Li, Yonghui
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (10) : 15786 - 15791
[30] Exploiting foreign resources for DNN-based ASR
Petr Motlicek
David Imseng
Blaise Potard
Philip N. Garner
Ivan Himawan
EURASIP Journal on Audio, Speech, and Music Processing, 2015

← 1 2 3 4 5 →