Singing voice synthesis based on deep neural networks

被引：55

作者：

Nishimura, Masanari ^{[1
]}

Hashimoto, Kei ^{[1
]}

Oura, Keiichiro ^{[1
]}

Nankaku, Yoshihiko ^{[1
]}

Tokuda, Keiichi ^{[1
]}

机构：

[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

基金：

日本科学技术振兴机构;

关键词：

Singing voice synthesis; Neural network; DNN; Acoustic model; HMM;

D O I：

10.21437/Interspeech.2016-1027

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Singing voice synthesis techniques have been proposed based on a hidden Markov model (HMM). In these approaches, the spectrum, excitation, and duration of singing voices are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. However, the quality of the synthesized singing voices still has not reached that of natural singing voices. Deep neural networks (DNNs) have largely improved on conventional approaches in various research areas including speech recognition, image recognition, speech synthesis, etc. The DNN-based text-to-speech (TTS) synthesis can synthesize high quality speech. In the DNN-based TTS system, a DNN is trained to represent the mapping function from contextual features to acoustic features, which are modeled by decision tree-clustered context dependent HMMs in the HMM-based TTS system. In this paper, we propose singing voice synthesis based on a DNN and evaluate its effectiveness. The relationship between the musical score and its acoustic features is modeled in frames by a DNN. For the sparseness of pitch context in a database, a musical-note-level pitch normalization and linear-interpolation techniques are used to prepare the excitation features. Subjective experimental results show that the DNN-based system outperformed the HMM-based system in terms of naturalness.

引用

页码：2478 / 2482

页数：5

共 50 条

[1] Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling
Yi, Yuan-Hao
Ai, Yang
Ling, Zhen-Hua
Dai, Li-Rong
INTERSPEECH 2019, 2019, : 2593 - 2597
[2] SINGING VOICE DETECTION WITH DEEP RECURRENT NEURAL NETWORKS
Leglaive, Simon
Hennequin, Romain
Badeau, Roland
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 121 - 125
[3] Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System
Hono, Yukiya
Hashimoto, Kei
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2803 - 2815
[4] Comparative study of singing voice detection based on deep neural networks and ensemble learning
You, Shingchern D.
Liu, Chien-Hung
Chen, Woei-Kae
HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2018, 8
[5] Singing Voice Detection Based on Convolutional Neural Networks
Huang, Hong-Ming
Chen, Woei-Kae
Liu, Chien-Hung
You, Shingchern D.
2018 7TH IEEE INTERNATIONAL SYMPOSIUM ON NEXT-GENERATION ELECTRONICS (ISNE), 2018, : 223 - 226
[6] Singing Voice Separation Based on Deep Regression Neural Network
Yang, Shuqian
Zhang, Wei-Qiang
2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
[7] SINGING VOICE SYNTHESIS BASED ON GENERATIVE ADVERSARIAL NETWORKS
Hono, Yukiya
Hashimoto, Kei
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6955 - 6959
[8] FAST AND HIGH-QUALITY SINGING VOICE SYNTHESIS SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORKS
Nakamura, Kazuhiro
Takaki, Shinji
Hashimoto, Kei
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7239 - 7243
[9] DATA EFFICIENT VOICE CLONING FOR NEURAL SINGING SYNTHESIS
Blaauw, Merlijn
Bonada, Jordi
Daido, Ryunosuke
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6840 - 6844
[10] Korean Singing Voice Synthesis System based on an LSTM Recurrent Neural Network
Kim, Juntae
Choi, Heejin
Park, Jinuk
Hahn, Minsoo
Kim, Sangjin
Kim, Jong-Jin
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1551 - 1555

← 1 2 3 4 5 →