PHASE RECONSTRUCTION FROM AMPLITUDE SPECTROGRAMS BASED ON VON-MISES-DISTRIBUTION DEEP NEURAL NETWORK

被引:0
|
作者
Takamichi, Shinnosuke [1 ]
Saito, Yuki [1 ]
Takamune, Norihiro [1 ]
Kitamura, Daichi [2 ]
Saruwatari, Hiroshi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
[2] Kagawa Coll, Natl Inst Technol, Dept Elect & Comp Engn, Takamatsu, Kagawa, Japan
关键词
speech analysis; phase reconstruction; deep neural network; von Mises distribution; group delay;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic speech. Addressing this problem, we introduce the von-Mises-distribution DNN for phase reconstruction. The DNN is a generative model having the von Mises distribution that can model distributions of a periodic variable such as a phase, and the model parameters of the DNN are estimated on the basis of the maximum likelihood criterion. Furthermore, we propose a group-delay loss for DNN training to make the predicted group delay close to a natural group delay. The experimental results demonstrate that 1) the trained DNN can predict group delay accurately more than phases themselves, and 2) our phase reconstruction methods achieve better speech quality than the conventional Griffin-Lim method.
引用
收藏
页码:286 / 290
页数:5
相关论文
共 50 条
  • [1] Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks
    Takamichi, Shinnosuke
    Saito, Yuki
    Takamune, Norihiro
    Kitamura, Daichi
    Saruwatari, Hiroshi
    SIGNAL PROCESSING, 2020, 169
  • [2] Sound Source Localization Based on von-Mises-Bernoulli Deep Neural Network
    Nakadai, Kazuhiro
    Masaki, Shungo
    Kojima, Ryosuke
    Sugiyama, Osamu
    Itoyama, Katsutoshi
    Nishida, Kenji
    2020 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2020, : 658 - 663
  • [3] SAIZ interferogram phase filtering based on the Von Mises distribution
    Huber, R
    Dutra, LV
    Freitas, CD
    IGARSS 2001: SCANNING THE PRESENT AND RESOLVING THE FUTURE, VOLS 1-7, PROCEEDINGS, 2001, : 2816 - 2818
  • [4] Assessment of von Mises-Bernoulli Deep Neural Network in Sound Source Localization
    Itoyama, Katsutoshi
    Morimoto, Yoshiya
    Masaki, Shungo
    Kojima, Ryosuke
    Nishida, Kenji
    Nakadai, Kazuhiro
    INTERSPEECH 2021, 2021, : 2152 - 2156
  • [5] Displacement-based Reconstruction of Elasticity Distribution with Deep Neural Network
    Zhang, Xiao
    Wang, Rui
    Wei, Xingyue
    Luo, Jianwen
    Peng, Bo
    2022 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IEEE IUS), 2022,
  • [6] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
    Badshah, Abdul Malik
    Ahmad, Jamil
    Rahim, Nasir
    Baik, Sung Wook
    2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129
  • [7] Two-stage phase reconstruction using DNN and von Mises distribution-based maximum likelihood
    Nguyen Binh Thien
    Wakabayashi, Yukoh
    Iwai, Kenta
    Nishiura, Takanobu
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 995 - 999
  • [8] Weighted Von Mises Distribution-based Loss Function for Real-time STFT Phase Reconstruction Using DNN
    Thien, Nguyen Binh
    Wakabayashi, Yukoh
    Geng Yuting
    Iwai, Kenta
    Nishiura, Takanobu
    INTERSPEECH 2023, 2023, : 3864 - 3868
  • [9] Von Mises Mixture Model-based DNN for Sign Indetermination Problem in Phase Reconstruction
    Thien, Nguyen Binh
    Wakabayashi, Yukoh
    Yuting, Geng
    Iwai, Kenta
    Nishiura, Takanobu
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 957 - 961
  • [10] Singer Gender Classification using Feature-based and Spectrograms with Deep Convolutional Neural Network
    Jitendra, Mukkamala S. N., V
    Radhika, Y.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 135 - 144