Transformer-Based Turkish Automatic Speech Recognition

被引:0
|
作者
Tasar, Davut Emre [1 ]
Koruyan, Kutan [2 ]
Cilgin, Cihan [3 ]
机构
[1] Dokuz Eylul Univ, Grad Sch Social Sci, Dept Management Informat Syst, Izmir, Turkiye
[2] Dokuz Eylul Univ, Fac Econ & Adm Sci, Dept Management Informat Syst, Izmir, Turkiye
[3] Bolu Abant Izzet Baysal Univ, Gerede Fac Appl Sci, Dept Management Informat Syst, Bolu, Turkiye
来源
ACTA INFOLOGICA | 2024年 / 8卷 / 01期
关键词
Wav2vec2; automatic speech recognition; speech-to-text transcription; natural language processing; transformer architecture;
D O I
10.26650/acin.1338604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today, businesses use Automatic Speech Recognition (ASR) technology more frequently to increase efficiency and productivity while performing many business functions. Due to the increased prevalence of online meetings in remote working and learning environments after the COVID-19 pandemic, speech recognition systems have seen more frequent utilization, exhibiting the significance of these systems. While English, Spanish or French languages have a lot of labeled data, there is very little labeled data for the Turkish language. This directly affects the accuracy of the ASR system negatively. Therefore, this study utilizes unlabeled audio data by learning general data representations with self-supervised learning end-to-end modeling. This study employed a transformer-based machine learning model with improved performance through transfer learning to convert speech recordings to text. The model adopted within the scope of the study is the Wav2Vec 2.0 architecture, which masks the audio inputs and solves the related task. The XLSR-Wav2Vec 2.0 model was pretrained on speech data in 53 languages and fine-tuned with the Mozilla Common Voice Turkish data set. According to the empirical results obtained within the scope of the study, a 0.23 word error rate was reached in the test set of the same data set.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [21] TRANSFORMER IN ACTION: A COMPARATIVE STUDY OF TRANSFORMER-BASED ACOUSTIC MODELS FOR LARGE SCALE SPEECH RECOGNITION APPLICATIONS
    Wang, Yongqiang
    Shi, Yangyang
    Zhang, Frank
    Wu, Chunyang
    Chan, Julian
    Yeh, Ching-Feng
    Xiao, Alex
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6778 - 6782
  • [22] A detailed survey of Turkish automatic speech recognition
    Arslan, Recep Sinan
    Barisci, Necaattin
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (06) : 3253 - 3269
  • [23] Musical Speech: A Transformer-based Composition Tool
    d'Eon, Jason
    Dumpala, Harsha
    Sastry, Chandramouli Shama
    Oore, Dani
    Oore, Sageev
    [J]. NEURIPS 2020 COMPETITION AND DEMONSTRATION TRACK, VOL 133, 2020, 133 : 253 - 274
  • [24] Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
    Kim, Sehoon
    Gholami, Amir
    Shaw, Albert
    Lee, Nicholas
    Mangalam, Karttikeya
    Malik, Jitendra
    Mahoney, Michael W.
    Keutzer, Kurt
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [25] STREAMING AUTOMATIC SPEECH RECOGNITION WITH THE TRANSFORMER MODEL
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6074 - 6078
  • [26] Web Service-Based Turkish Automatic Speech Recognition Platform
    Oyucu, Saadin
    Polat, Huseyin
    Sever, Hayri
    [J]. 2ND INTERNATIONAL CONGRESS ON HUMAN-COMPUTER INTERACTION, OPTIMIZATION AND ROBOTIC APPLICATIONS (HORA 2020), 2020, : 389 - 393
  • [27] TRANSFORMER-BASED DIRECT SPEECH-TO-SPEECH TRANSLATION WITH TRANSCODER
    Kano, Takatomo
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 958 - 965
  • [28] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [29] Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
    Li, Mohan
    Doddipatla, Rama Sanand
    Zorila, Catalin
    [J]. INTERSPEECH 2022, 2022, : 2088 - 2092
  • [30] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
    Miao, Haoran
    Cheng, Gaofeng
    Gao, Changfeng
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088