Noise Robust Fundamental Frequency Estimation of Speech using CNN-based discriminative modeling

被引:0
|
作者
Kawamura, Tomonori [1 ]
Kai, Atsuhiko [1 ]
Nakagawa, Seiichi [2 ]
机构
[1] Shizuoka Univ, Grad Sch Integrated Sci & Technol, Hamamatsu, Shizuoka, Japan
[2] Chubu Univ, Kasugai, Aichi, Japan
关键词
Speech processing; Fundamental frequancy estimation; convolutional neural network;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The fundamental frequency (F0) is a quantity representing the pitch of periodic signal and its estimation for time-variant quasiperiodic acoustic signal is one of common problems in speech processing studies. The correct estimation of this contributes to the improvement of speech processing systems such as, analysis of prosody, test-to-speech system and speech recognition system. While many algorithms have been proposed and they exhibit excellent performance for clean environment, it is a very difficult task for noisy environment. It is generally known that machine learning approach is effective as a discriminative model for handling data in which noise is mixed. In this paper, we propose a robust fundamental frequency estimation method for noisy speech signal by using convolutional neural network (CNN) which is a type of deep neural network (DNN). In our proposed method, convolution layer and pooling layer serve as an approximator of autocorrelation analysis and followed by discriminative modeling for classifying quantized F0 state. This process acquires a discriminator that extracts noise robust FO features. Experimental result showed that our method outperforms convolutional methods based on autocorrelation analysis and its combination with DNN modeling.
引用
收藏
页码:60 / 65
页数:6
相关论文
共 50 条
  • [1] ROBUST FUNDAMENTAL FREQUENCY ESTIMATION IN COLOURED NOISE
    Jaramillo, Alfredo Esquivel
    Jakobsson, Andreas
    Nielsen, Jesper Kjaer
    Christensen, Mads Graesboll
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 741 - 745
  • [2] Robust CNN-based Speech Recognition With Gabor Filter Kernels
    Chang, Shuo-Yiin
    Morgan, Nelson
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 905 - 909
  • [3] CNN-based Camera Model Identification Using Image Noise in Frequency Domain
    Cai, Tiantian
    Shao, Zhanjian
    Tomioka, Yoichi
    Liu, Yuanyuan
    Li, Zhu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3518 - 3524
  • [4] Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration
    Strake, Maximilian
    Defraene, Bruno
    Fluyt, Kristoff
    Tirry, Wouter
    Fingscheidt, Tim
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2020, 2020 (01)
  • [5] Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration
    Maximilian Strake
    Bruno Defraene
    Kristoff Fluyt
    Wouter Tirry
    Tim Fingscheidt
    [J]. EURASIP Journal on Advances in Signal Processing, 2020
  • [6] A nonlocal HEVC in-loop filter using CNN-based compression noise estimation
    Weiheng Sun
    Xiaohai He
    Honggang Chen
    Shuhua Xiong
    Yifei Xu
    [J]. Applied Intelligence, 2022, 52 : 17810 - 17828
  • [7] A nonlocal HEVC in-loop filter using CNN-based compression noise estimation
    Sun, Weiheng
    He, Xiaohai
    Chen, Honggang
    Xiong, Shuhua
    Xu, Yifei
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 17810 - 17828
  • [8] Object Viewpoint Estimation using CNN-based Classifier
    Bong, Eunsoo
    Lee, Eunho
    Hwang, Youngbae
    [J]. 2022 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON22), 2022, : 80 - 85
  • [9] Modeling Traffic Scenes for Intelligent Vehicles Using CNN-Based Detection and Orientation Estimation
    Guindel, Carlos
    Martin, David
    Maria Armingol, Jose
    [J]. ROBOT 2017: THIRD IBERIAN ROBOTICS CONFERENCE, VOL 2, 2018, 694 : 487 - 498
  • [10] Analysis of CNN-based Speech Recognition System using Raw Speech as Input
    Palaz, Dimitri
    Magimai-Doss, Mathew
    Collobert, Ronan
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 11 - 15