RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

被引:0
|
作者
Stan, Adriana [1 ]
机构
[1] Tech Univ Cluj Napoca, Commun Dept, Cluj Napoca, Romania
来源
关键词
speech recording tool; multilingual; phonetic transcription; grapheme-to-phoneme; evolution strategy; sequence-to-sequence; convolutional networks; transformer networks;
D O I
10.21437/Interspeech.2020-1184
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from theWiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.
引用
收藏
页码:586 / 590
页数:5
相关论文
共 34 条
  • [21] Enabling End-to-End Data-Driven Sensor-Based Scientific and Engineering Applications
    Jiang, Nanyan
    Parashar, Manish
    COMPUTATIONAL SCIENCE - ICCS 2009, 2009, 5545 : 449 - +
  • [22] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
    Oh, Yoo Rhee
    Park, Kiyoung
    Park, Jeon Gue
    ETRI JOURNAL, 2022, 44 (03) : 476 - 490
  • [23] Benefits of merging paired-end reads before pre-processing environmental metagenomics data
    Immaculate, Midhuna
    Maran, Joseph
    Davis, Dicky John G.
    MARINE GENOMICS, 2022, 61
  • [24] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
    Shen, Yunfei
    Liu, Qingqing
    Fan, Zhixing
    Liu, Jiajun
    Wumaier, Aishan
    IEEE ACCESS, 2022, 10 : 106451 - 106462
  • [25] A Case for Understanding End-to-End Performance of Topic Detection and Tracking Based Big Data Applications in the Cloud
    Wang, Meisong
    Ranjan, Rajiv
    Jayaraman, Prem Prakash
    Strazdins, Peter
    Burnap, Pete
    Rana, Omer
    Georgakopulos, Dimitrios
    INTERNET OF THINGS: IOT INFRASTRUCTURES, PT I, 2016, 169 : 315 - 325
  • [26] E3TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications
    Liang, Zheng
    Ma, Ziyang
    Du, Chenpeng
    Yu, Kai
    Chen, Xie
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4810 - 4821
  • [27] Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition
    Masumura, Ryo
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    INTERSPEECH 2020, 2020, : 2822 - 2826
  • [28] SIMPLEFLAT: A SIMPLE WHOLE-NETWORK PRE-TRAINING APPROACH FOR RNN TRANSDUCER-BASED END-TO-END SPEECH RECOGNITION
    Moriya, Takafumi
    Ashihara, Takanori
    Tanaka, Tomohiro
    Ochiai, Tsubasa
    Sato, Hiroshi
    Ando, Atsushi
    Ijima, Yusuke
    Masumura, Ryo
    Shinohara, Yusuke
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5664 - 5668
  • [29] Blue Danube: A Large-Scale, End-to-End Synchronous, Distributed Data Stream Processing Architecture for Time-Sensitive Applications
    Michael, Panayiotis A.
    Tsanakas, Panayiotis D.
    Parker, D. S.
    2022 IEEE/ACM 26TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2022,
  • [30] Towards End-to-End QoS and Cost-Aware Resource Scaling in Cloud-Based IoT Data Processing Pipelines
    Samant, Sunil Singh
    Chhetri, Mohan Baruwal
    Quoc Bao Vo
    Kowalczyk, Ryszard
    Nepal, Surya
    2018 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2018), 2018, : 287 - 290