Speech Synthesis Using Ambiguous Inputs From Wearable Keyboards

被引:0
|
作者
Iwasaki, Matsuri [1 ]
Hara, Sunao [1 ]
Abe, Masanobu [1 ]
机构
[1] Okayama Univ, Okayama, Japan
关键词
D O I
10.1109/APSIPAASC58517.2023.10317228
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a new application in speech communication using text-to-speech (TTS), and the goal is to enable dysarthria, articulation disorder, or persons who have difficulty in speaking to communicate anywhere and anytime using speech to express their thoughts and feelings. To achieve this goal, an input method is required. Thus, we propose a new text-entry method based on three concepts. First, from an easy-to-carry perspective, we used a wearable keyboard that inputs digits from 0 to 9 in decimal notation according to 10-finger movements. Second, from a no-training perspective, users input sentences in a way of touch typing using the wearable keyboard. Following this method, we obtained a sequence of numbers corresponding to the sentence. Third, a neural machine translation (NMT) method is applied to estimate texts from the sequence of numbers. The NMT was trained using two datasets; one is a Japanese-English parallel corpus containing 2.8 million pairs of sentences, which were extracted from TV and movie subtitles, while the other is a Japanese text dataset containing 32 million sentences, which were extracted from a question-and-answer platform. Using the model, phonemes and accent symbols were estimated from a sequence of numbers. Thus, the result accuracy in symbol levels was 91.48% and 43.45% of all the sentences were completely estimated with no errors. To subjectively evaluate feasibility of the NMT model, a two-person word association game was conducted; one gave hints using synthesized speech that is generated from symbols estimated by NMT, while the other guessed answers. As a result, 67.95% of all the quizzes were correctly answered, and experiment results show that the proposed method has the potential for dysarthria to communicate with TTS using a wearable keyboard.
引用
收藏
页码:1172 / 1178
页数:7
相关论文
共 50 条
  • [31] Sign Language to Speech Conversion Using Smart Band and Wearable Computer Technology
    Vishal, Dasari
    Aishwarya, H. M.
    Nishkala, K.
    Royan, B. Toshitha
    Ramesh, T. K.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, : 834 - 837
  • [32] Mixed-modality speech recognition and interaction using a wearable artificial throat
    Qisheng Yang
    Weiqiu Jin
    Qihang Zhang
    Yuhong Wei
    Zhanfeng Guo
    Xiaoshi Li
    Yi Yang
    Qingquan Luo
    He Tian
    Tian-Ling Ren
    Nature Machine Intelligence, 2023, 5 : 169 - 180
  • [33] Mixed-modality speech recognition and interaction using a wearable artificial throat
    Yang, Qisheng
    Jin, Weiqiu
    Zhang, Qihang
    Wei, Yuhong
    Guo, Zhanfeng
    Li, Xiaoshi
    Yang, Yi
    Luo, Qingquan
    Tian, He
    Ren, Tian-Ling
    NATURE MACHINE INTELLIGENCE, 2023, 5 (02) : 169 - 180
  • [34] Proposal of a Wearable Personal Concierge System with Healthcare Using Speech Dialogue Technology
    Shoji, Naohiro
    Motomura, Jun
    Kokubu, Nagisa
    Fuse, Hiroki
    Namba, Takayo
    Abe, Keiichi
    2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2021,
  • [35] Lithuanian Speech Synthesis by Computer using Additive Synthesis
    Pyz, G.
    Simonyte, V.
    Slivinskas, V.
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2012, 18 (08) : 77 - 80
  • [36] Using perspective taking to learn from ambiguous demonstrations
    Breazeal, Cynthia
    Berlin, Matt
    Brooks, Andrew
    Gray, Jesse
    Thomaz, Andrea L.
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2006, 54 (05) : 385 - 393
  • [37] TOPIC DETECTION IN CONVERSATIONAL TELEPHONE SPEECH USING CNN WITH MULTI-STREAM INPUTS
    Sun, Jian
    Guo, Wu
    Chen, Zhi
    Song, Yan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7285 - 7289
  • [38] (sp)iPhone: Decoding Vibrations From Nearby Keyboards Using Mobile Phone Accelerometers
    Marquardt, Philip
    Verma, Arunabh
    Carter, Henry
    Traynor, Patrick
    PROCEEDINGS OF THE 18TH ACM CONFERENCE ON COMPUTER & COMMUNICATIONS SECURITY (CCS 11), 2011, : 551 - 562
  • [39] SPEECH SYNTHESIS FROM STORED DATA
    ESTES, SE
    MAXEY, HD
    WALKER, RM
    KERBY, HR
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1964, 8 (01) : 2 - &
  • [40] SPEECH SYNTHESIS FROM PHONEMIC TRANSCRIPTION
    OLIVE, JP
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S163 - S163