Speech Synthesis Using Ambiguous Inputs From Wearable Keyboards

被引:0
|
作者
Iwasaki, Matsuri [1 ]
Hara, Sunao [1 ]
Abe, Masanobu [1 ]
机构
[1] Okayama Univ, Okayama, Japan
关键词
D O I
10.1109/APSIPAASC58517.2023.10317228
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a new application in speech communication using text-to-speech (TTS), and the goal is to enable dysarthria, articulation disorder, or persons who have difficulty in speaking to communicate anywhere and anytime using speech to express their thoughts and feelings. To achieve this goal, an input method is required. Thus, we propose a new text-entry method based on three concepts. First, from an easy-to-carry perspective, we used a wearable keyboard that inputs digits from 0 to 9 in decimal notation according to 10-finger movements. Second, from a no-training perspective, users input sentences in a way of touch typing using the wearable keyboard. Following this method, we obtained a sequence of numbers corresponding to the sentence. Third, a neural machine translation (NMT) method is applied to estimate texts from the sequence of numbers. The NMT was trained using two datasets; one is a Japanese-English parallel corpus containing 2.8 million pairs of sentences, which were extracted from TV and movie subtitles, while the other is a Japanese text dataset containing 32 million sentences, which were extracted from a question-and-answer platform. Using the model, phonemes and accent symbols were estimated from a sequence of numbers. Thus, the result accuracy in symbol levels was 91.48% and 43.45% of all the sentences were completely estimated with no errors. To subjectively evaluate feasibility of the NMT model, a two-person word association game was conducted; one gave hints using synthesized speech that is generated from symbols estimated by NMT, while the other guessed answers. As a result, 67.95% of all the quizzes were correctly answered, and experiment results show that the proposed method has the potential for dysarthria to communicate with TTS using a wearable keyboard.
引用
收藏
页码:1172 / 1178
页数:7
相关论文
共 50 条
  • [1] Asynchronous Multimodal Text Entry using Speech and Gesture Keyboards
    Kristensson, Per Ola
    Vertanen, Keith
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 588 - +
  • [2] Enhancing Speech Recorded from a Wearable Sensor Using a Collection of Autoencoders
    Gonzalez-Salazar, Astryd
    Gutierrez-Munoz, Michelle
    Coto-Jimenez, Marvin
    HIGH PERFORMANCE COMPUTING, CARLA 2019, 2020, 1087 : 383 - 397
  • [3] LPC SYNTHESIS FROM SPEECH INPUTS CONTAINING QUANTIZING NOISE AND ADDITIVE WHITE NOISE
    SAMBUR, MR
    JAYANT, NS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 : S61 - S61
  • [4] What type of inputs will we need for expressive speech synthesis?
    Campbell, N
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 95 - 98
  • [5] SilentWhisper: faint whisper speech using wearable microphone
    Hiraki, Hirotaka
    Rekimoto, Jun
    ADJUNCT PROCEEDINGS OF THE 35TH ACM SYMPOSIUM ON USER INTERFACE SOFTWARE & TECHNOLOGY, UIST 2022, 2022,
  • [6] SPEECH SYNTHESIS USING SPEECH SOUNDS
    GROSSMANN, E
    ACUSTICA, 1976, 35 (04): : 258 - 265
  • [7] Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN
    Enriquez, Marc Dominic
    Lucas, Crisron Rudolf
    Aquino, Angelina
    2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
  • [8] Wideband speech recovery from bandlimited speech using LP analysis/synthesis
    Yasukawa, H
    SIGNAL ANALYSIS & PREDICTION I, 1997, : 364 - 367
  • [9] Prosody-controllable gender-ambiguous speech synthesis: a tool for investigating implicit bias in speech perception
    Szekely, Eva
    Gustafson, Joakim
    Torre, Ilaria
    INTERSPEECH 2023, 2023, : 1234 - 1238
  • [10] Neural Computation of Surface Border Ownership and Relative Surface Depth from Ambiguous Contrast Inputs
    Dresp-Langley, Birgitta
    Grossberg, Stephen
    FRONTIERS IN PSYCHOLOGY, 2016, 7