Speech Synthesis Using Ambiguous Inputs From Wearable Keyboards

被引:0
|
作者
Iwasaki, Matsuri [1 ]
Hara, Sunao [1 ]
Abe, Masanobu [1 ]
机构
[1] Okayama Univ, Okayama, Japan
关键词
D O I
10.1109/APSIPAASC58517.2023.10317228
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a new application in speech communication using text-to-speech (TTS), and the goal is to enable dysarthria, articulation disorder, or persons who have difficulty in speaking to communicate anywhere and anytime using speech to express their thoughts and feelings. To achieve this goal, an input method is required. Thus, we propose a new text-entry method based on three concepts. First, from an easy-to-carry perspective, we used a wearable keyboard that inputs digits from 0 to 9 in decimal notation according to 10-finger movements. Second, from a no-training perspective, users input sentences in a way of touch typing using the wearable keyboard. Following this method, we obtained a sequence of numbers corresponding to the sentence. Third, a neural machine translation (NMT) method is applied to estimate texts from the sequence of numbers. The NMT was trained using two datasets; one is a Japanese-English parallel corpus containing 2.8 million pairs of sentences, which were extracted from TV and movie subtitles, while the other is a Japanese text dataset containing 32 million sentences, which were extracted from a question-and-answer platform. Using the model, phonemes and accent symbols were estimated from a sequence of numbers. Thus, the result accuracy in symbol levels was 91.48% and 43.45% of all the sentences were completely estimated with no errors. To subjectively evaluate feasibility of the NMT model, a two-person word association game was conducted; one gave hints using synthesized speech that is generated from symbols estimated by NMT, while the other guessed answers. As a result, 67.95% of all the quizzes were correctly answered, and experiment results show that the proposed method has the potential for dysarthria to communicate with TTS using a wearable keyboard.
引用
收藏
页码:1172 / 1178
页数:7
相关论文
共 50 条
  • [41] SPEECH SYNTHESIS FROM STORED DATA
    ESTES, SE
    WALKER, RM
    MAXEY, HD
    KERBY, HR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1962, 34 (12): : 2003 - &
  • [42] SYNTHESIS OF SPEECH FROM UNRESTRICTED TEXT
    ALLEN, J
    PROCEEDINGS OF THE IEEE, 1976, 64 (04) : 433 - 442
  • [43] Selection in a concatenative speech synthesis system using a large speech database
    Hunt, AJ
    Black, AW
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 373 - 376
  • [44] Expressive Speech Synthesis Using Emotion-Specific Speech Inventories
    Zainko, Csaba
    Fek, Mark
    Nemeth, Geza
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 225 - 234
  • [45] Speech-to-Visual Speech Synthesis Using Chinese Visual Triphone
    Zhao, Hui
    Shen, Yamin
    Tang, Chaojing
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 2, 2010, : 241 - 245
  • [46] Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
    Huang, Jian
    Li, Ya
    Tao, Jianhua
    Lian, Zheng
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3673 - 3677
  • [47] ECG synthesis from separate wearable bipolar electrodes
    Farotto, D.
    Atallah, L.
    van der Heijden, P.
    Grieten, L.
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 5058 - 5061
  • [48] Using Semantics to Automatically Generate Speech Interfaces for Wearable Virtual and Augmented Reality Applications
    Lamberti, Fabrizio
    Manuri, Federico
    Paravati, Gianluca
    Piumatti, Giovanni
    Sanna, Andrea
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2017, 47 (01) : 152 - 164
  • [49] Unsupervised features from text for speech synthesis in a speech-to-speech translation system
    Watts, Oliver
    Zhou, Bowen
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2164 - 2167
  • [50] Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for Punjabi
    Kaur, Navdeep
    Singh, Parminder
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 26101 - 26120