RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

被引:0
|
作者
Stan, Adriana [1 ]
机构
[1] Tech Univ Cluj Napoca, Commun Dept, Cluj Napoca, Romania
来源
关键词
speech recording tool; multilingual; phonetic transcription; grapheme-to-phoneme; evolution strategy; sequence-to-sequence; convolutional networks; transformer networks;
D O I
10.21437/Interspeech.2020-1184
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from theWiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.
引用
收藏
页码:586 / 590
页数:5
相关论文
共 34 条
  • [31] Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
    Wang, Nick J. C.
    Wang, Lu
    Sun, Yandan
    Kang, Haimei
    Zhang, Dejun
    INTERSPEECH 2021, 2021, : 4718 - 4722
  • [32] Trustworthy Pre-processing of Sensor Data in Data On-Chaining Workflows for Blockchain-Based IoT Applications
    Heiss, Jonathan
    Busse, Anselm
    Tai, Stefan
    SERVICE-ORIENTED COMPUTING (ICSOC 2021), 2021, 13121 : 133 - 149
  • [33] IMPROVING SPEECH-BASED END-OF-TURN DETECTION VIA CROSS-MODAL REPRESENTATION LEARNING WITH PUNCTUATED TEXT DATA
    Masumura, Ryo
    Ihori, Mana
    Tanaka, Tomohiro
    Ando, Atsushi
    Ishii, Ryo
    Oba, Takanobu
    Higashinaka, Ryuichiro
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 1062 - 1069
  • [34] SAFR-AV: Safety Analysis of Autonomous Vehicles Using Real World Data: An End-to-End Solution for Real World Data Driven Scenario-Based Testing for Pre-Certification of AV Stacks
    Pathrudkar, Sagar
    Venkataraman, Saadhana B.
    Kanade, Deepika
    Ajayan, Aswin
    Gupta, Palash
    Khatib, Shehzaman Salim
    Indla, Vijaya Sarathi
    Mukherjee, Saikat
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON VEHICLE TECHNOLOGY AND INTELLIGENT TRANSPORT SYSTEMS, VEHITS 2023, 2023, : 232 - 239