The Use of Air-Pressure Sensor in Electrolaryngeal Speech Enhancement Based on Statistical Voice Conversion

被引:0
|
作者
Nakamura, Keigo [1 ]
Toda, Tomoki [1 ]
Saruwatari, Hiroshi [1 ]
Shikano, Kiyohiro [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
Electrolarynx; Air-pressure sensor; Laryngectomee; Voice conversion; Speaking-aid;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In our previous work, we proposed a speaking-aid system converting electrolaryngeal speech (EL speech) to normal speech using a statistical voice conversion technique. The main weakness of our system is the difficulty of estimating natural contours of the fundamental frequency (F-0) from EL speech including only built-in F-0 contours. This paper proposes another speaking-aid system with an air-pressure sensor to enable laryngectomees to control F-0 contours of the EL speech using their breathing air. The experimental result demonstrates that 1) the correlation coefficient of F-0 contours between the converted and the target speech is improved from 0.58 to 0.78 by the use of the air-pressure sensor and 2) the synthetic speech converted by the proposed system sounds more natural and is more preferred to that by our conventional aid system.
引用
收藏
页码:1628 / 1631
页数:4
相关论文
共 50 条
  • [1] Electrolaryngeal Speech Enhancement Based on Statistical Voice Conversion
    Nakamura, Keigo
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1443 - 1446
  • [2] Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2115 - 2119
  • [3] A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Spectral Subtraction and Statistical Voice Conversion
    Tanaka, Kou
    Toda, Tomoki
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3066 - 3070
  • [4] A Digital Signal Processor Implementation of Silent/Electrolaryngeal Speech Enhancement based on Real-Time Statistical Voice Conversion
    Moriguchi, Takuto
    Toda, Tomoki
    Sano, Motoaki
    Sato, Hiroshi
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3071 - 3075
  • [5] Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion
    Yang, Yaogen
    Zhang, Haozhe
    Cai, Zexin
    Shi, Yao
    Li, Ming
    Zhang, Dong
    Ding, Xiaojun
    Deng, Jianhua
    Wang, Jie
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
  • [6] TWO-STAGE TRAINING METHOD FOR JAPANESE ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION
    Ma, Ding
    Violeta, Lester Phillip
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 949 - 954
  • [7] Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
    Doi, Hironori
    Nakamura, Keigo
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2472 - 2482
  • [8] A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation
    Tanaka, Kou
    Toda, Tomoki
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1429 - 1437
  • [9] MANDARIN ELECTROLARYNGEAL SPEECH VOICE CONVERSION WITH SEQUENCE-TO-SEQUENCE MODELING
    Yen, Ming-Chi
    Huang, Wen-Chin
    Kobayashi, Kazuhiro
    Peng, Yu-Huai
    Tsai, Shu-Wei
    Tsao, Yu
    Toda, Tomoki
    Jang, Jyh-Shing Roger
    Wang, Hsin-Min
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 650 - 657
  • [10] MESOPHARYNGEAL AIR-PRESSURE IN WHISPERED SPEECH
    HIGASHIKAWA, M
    SAKAKURA, A
    TAKAHASHI, H
    [J]. FOLIA PHONIATRICA ET LOGOPAEDICA, 1995, 47 (02) : 77 - 78