RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

被引：0

作者：

Stan, Adriana ^{[1
]}

机构：

[1] Tech Univ Cluj Napoca, Commun Dept, Cluj Napoca, Romania

来源：

INTERSPEECH 2020 | 2020年

关键词：

speech recording tool; multilingual; phonetic transcription; grapheme-to-phoneme; evolution strategy; sequence-to-sequence; convolutional networks; transformer networks;

D O I：

10.21437/Interspeech.2020-1184

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from theWiktionary online collaborative resource. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The phoneme and word error rates of the resulting G2P converters are presented and discussed. The tool, the processed phonetic lexicons and trained G2P models are made freely available.

引用

页码：586 / 590

页数：5

共 34 条

[31] Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
Wang, Nick J. C.
Wang, Lu
Sun, Yandan
Kang, Haimei
Zhang, Dejun
INTERSPEECH 2021, 2021, : 4718 - 4722
[32] Trustworthy Pre-processing of Sensor Data in Data On-Chaining Workflows for Blockchain-Based IoT Applications
Heiss, Jonathan
Busse, Anselm
Tai, Stefan
SERVICE-ORIENTED COMPUTING (ICSOC 2021), 2021, 13121 : 133 - 149
[33] IMPROVING SPEECH-BASED END-OF-TURN DETECTION VIA CROSS-MODAL REPRESENTATION LEARNING WITH PUNCTUATED TEXT DATA
Masumura, Ryo
Ihori, Mana
Tanaka, Tomohiro
Ando, Atsushi
Ishii, Ryo
Oba, Takanobu
Higashinaka, Ryuichiro
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 1062 - 1069
[34] SAFR-AV: Safety Analysis of Autonomous Vehicles Using Real World Data: An End-to-End Solution for Real World Data Driven Scenario-Based Testing for Pre-Certification of AV Stacks
Pathrudkar, Sagar
Venkataraman, Saadhana B.
Kanade, Deepika
Ajayan, Aswin
Gupta, Palash
Khatib, Shehzaman Salim
Indla, Vijaya Sarathi
Mukherjee, Saikat
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON VEHICLE TECHNOLOGY AND INTELLIGENT TRANSPORT SYSTEMS, VEHITS 2023, 2023, : 232 - 239

← 1 2 3 4 →