RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

被引：0

作者：

Stan, Adriana ^{[1
]}

机构：

[1] Tech Univ Cluj Napoca, Commun Dept, Cluj Napoca, Romania

来源：

INTERSPEECH 2020 | 2020年

关键词：

speech recording tool; multilingual; phonetic transcription; grapheme-to-phoneme; evolution strategy; sequence-to-sequence; convolutional networks; transformer networks;

D O I：

10.21437/Interspeech.2020-1184

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from theWiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.

引用

页码：586 / 590

页数：5

共 34 条

[21] Enabling End-to-End Data-Driven Sensor-Based Scientific and Engineering Applications
Jiang, Nanyan
Parashar, Manish
COMPUTATIONAL SCIENCE - ICCS 2009, 2009, 5545 : 449 - +
[22] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
Oh, Yoo Rhee
Park, Kiyoung
Park, Jeon Gue
ETRI JOURNAL, 2022, 44 (03) : 476 - 490
[23] Benefits of merging paired-end reads before pre-processing environmental metagenomics data
Immaculate, Midhuna
Maran, Joseph
Davis, Dicky John G.
MARINE GENOMICS, 2022, 61
[24] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
Shen, Yunfei
Liu, Qingqing
Fan, Zhixing
Liu, Jiajun
Wumaier, Aishan
IEEE ACCESS, 2022, 10 : 106451 - 106462
[25] A Case for Understanding End-to-End Performance of Topic Detection and Tracking Based Big Data Applications in the Cloud
Wang, Meisong
Ranjan, Rajiv
Jayaraman, Prem Prakash
Strazdins, Peter
Burnap, Pete
Rana, Omer
Georgakopulos, Dimitrios
INTERNET OF THINGS: IOT INFRASTRUCTURES, PT I, 2016, 169 : 315 - 325
[26] E3TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications
Liang, Zheng
Ma, Ziyang
Du, Chenpeng
Yu, Kai
Chen, Xie
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4810 - 4821
[27] Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition
Masumura, Ryo
Makishima, Naoki
Ihori, Mana
Takashima, Akihiko
Tanaka, Tomohiro
Orihashi, Shota
INTERSPEECH 2020, 2020, : 2822 - 2826
[28] SIMPLEFLAT: A SIMPLE WHOLE-NETWORK PRE-TRAINING APPROACH FOR RNN TRANSDUCER-BASED END-TO-END SPEECH RECOGNITION
Moriya, Takafumi
Ashihara, Takanori
Tanaka, Tomohiro
Ochiai, Tsubasa
Sato, Hiroshi
Ando, Atsushi
Ijima, Yusuke
Masumura, Ryo
Shinohara, Yusuke
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5664 - 5668
[29] Blue Danube: A Large-Scale, End-to-End Synchronous, Distributed Data Stream Processing Architecture for Time-Sensitive Applications
Michael, Panayiotis A.
Tsanakas, Panayiotis D.
Parker, D. S.
2022 IEEE/ACM 26TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2022,
[30] Towards End-to-End QoS and Cost-Aware Resource Scaling in Cloud-Based IoT Data Processing Pipelines
Samant, Sunil Singh
Chhetri, Mohan Baruwal
Quoc Bao Vo
Kowalczyk, Ryszard
Nepal, Surya
2018 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2018), 2018, : 287 - 290

← 1 2 3 4 →