RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

被引:0
|
作者
Stan, Adriana [1 ]
机构
[1] Tech Univ Cluj Napoca, Commun Dept, Cluj Napoca, Romania
来源
关键词
speech recording tool; multilingual; phonetic transcription; grapheme-to-phoneme; evolution strategy; sequence-to-sequence; convolutional networks; transformer networks;
D O I
10.21437/Interspeech.2020-1184
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from theWiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.
引用
收藏
页码:586 / 590
页数:5
相关论文
共 34 条
  • [1] Neural Pre-processing: A Learning Framework for End-to-End Brain MRI Pre-processing
    He, Xinzi
    Wang, Alan Q.
    Sabuncu, Mert R.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 : 258 - 267
  • [2] ConflictNET: End-to-End Learning for Speech-Based Conflict Intensity Estimation
    Rajan, Vandana
    Brutti, Alessio
    Cavallaro, Andrea
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (11) : 1668 - 1672
  • [3] Lattice Based Transcription Loss for End-to-End Speech Recognition
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Jia
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [4] Lattice Based Transcription Loss for End-to-End Speech Recognition
    Jian Kang
    Wei-Qiang Zhang
    Wei-Wei Liu
    Jia Liu
    Michael T. Johnson
    Journal of Signal Processing Systems, 2018, 90 : 1013 - 1023
  • [5] Lattice Based Transcription Loss for End-to-End Speech Recognition
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Wei-Wei
    Liu, Jia
    Johnson, Michael T.
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 1013 - 1023
  • [6] End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks
    Prabhu, Navin Raj
    Carbajal, Guillaume
    Lehmann-Willenbrock, Nale
    Gerkmann, Timo
    INTERSPEECH 2022, 2022, : 151 - 155
  • [7] Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
    Lu, Jianqiao
    Huang, Wenyong
    Zheng, Nianzu
    Zeng, Xingshan
    Yeung, Yu Ting
    Chen, Xiao
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4916 - 4928
  • [8] SBVQA 2.0: Robust End-to-End Speech-Based Visual Question Answering for Open-Ended Questions
    Alasmary, Faris
    Al-Ahmadi, Saad
    IEEE ACCESS, 2023, 11 : 140967 - 140980
  • [9] An end-to-end approach to the EUCLID NISP on-board pre-processing operations. Tests and latest results
    Bonoli, Carlotta
    Bortoletto, Favio
    D'Alessandro, Maurizio
    Corcione, Leonardo
    Ligori, Sebastiano
    Nicastro, Luciano
    Trifoglio, Massimo
    Valenziano, Luca
    Zerbi, Filippo M.
    Crouzet, Pierre-Elie
    Jung, Andreas
    SPACE TELESCOPES AND INSTRUMENTATION 2012: OPTICAL, INFRARED, AND MILLIMETER WAVE, 2012, 8442
  • [10] End-to-end speech topic classification based on pre-trained model Wavlm
    Cao, Tengfei
    He, Liang
    Niu, Fangjing
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 369 - 373