Transformer-Based Turkish Automatic Speech Recognition

被引:0
|
作者
Tasar, Davut Emre [1 ]
Koruyan, Kutan [2 ]
Cilgin, Cihan [3 ]
机构
[1] Dokuz Eylul Univ, Grad Sch Social Sci, Dept Management Informat Syst, Izmir, Turkiye
[2] Dokuz Eylul Univ, Fac Econ & Adm Sci, Dept Management Informat Syst, Izmir, Turkiye
[3] Bolu Abant Izzet Baysal Univ, Gerede Fac Appl Sci, Dept Management Informat Syst, Bolu, Turkiye
来源
ACTA INFOLOGICA | 2024年 / 8卷 / 01期
关键词
Wav2vec2; automatic speech recognition; speech-to-text transcription; natural language processing; transformer architecture;
D O I
10.26650/acin.1338604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today, businesses use Automatic Speech Recognition (ASR) technology more frequently to increase efficiency and productivity while performing many business functions. Due to the increased prevalence of online meetings in remote working and learning environments after the COVID-19 pandemic, speech recognition systems have seen more frequent utilization, exhibiting the significance of these systems. While English, Spanish or French languages have a lot of labeled data, there is very little labeled data for the Turkish language. This directly affects the accuracy of the ASR system negatively. Therefore, this study utilizes unlabeled audio data by learning general data representations with self-supervised learning end-to-end modeling. This study employed a transformer-based machine learning model with improved performance through transfer learning to convert speech recordings to text. The model adopted within the scope of the study is the Wav2Vec 2.0 architecture, which masks the audio inputs and solves the related task. The XLSR-Wav2Vec 2.0 model was pretrained on speech data in 53 languages and fine-tuned with the Mozilla Common Voice Turkish data set. According to the empirical results obtained within the scope of the study, a 0.23 word error rate was reached in the test set of the same data set.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [1] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
  • [2] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
    Zhao, Chendong
    Wang, Jianzong
    Wei, Wenqi
    Qu, Xiaoyang
    Wang, Haoqian
    Xiao, Jing
    [J]. 2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180
  • [3] A transformer-based network for speech recognition
    Tang L.
    [J]. International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
  • [4] Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
    Lehecka, Jan
    Psutka, Josef, V
    Psutka, Josef
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 301 - 312
  • [5] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
    Hadwan, Mohammed
    Alsayadi, Hamzah A.
    AL-Hagree, Salah
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
  • [6] Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
    Tanaka, Tomohiro
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Moriya, Takafumi
    Ashihara, Takanori
    Orihashi, Shota
    Makishima, Naoki
    [J]. INTERSPEECH 2021, 2021, : 4059 - 4063
  • [7] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
    Wang, Yongqiang
    Mohamed, Abdelrahman
    Le, Duc
    Liu, Chunxi
    Xiao, Alex
    Mahadeokar, Jay
    Huang, Hongzhao
    Tjandra, Andros
    Zhang, Xiaohui
    Zhang, Frank
    Fuegen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
  • [8] RM-Transformer: A Transformer-based Model for Mandarin Speech Recognition
    Lu, Xingyu
    Hu, Jianguo
    Li, Shenhao
    Ding, Yanyu
    [J]. 2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 194 - 198
  • [9] Transformer-based Automatic Speech Recognition of Simultaneous Interpretation with Auxiliary Input of Source Language Text
    Taniguchi, Shuta
    Kato, Tsuneo
    Tamura, Akihiro
    Yasuda, Keiji
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1857 - 1861
  • [10] Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems
    Viet The Bui
    Tho Chi Luong
    Oanh Thi Tran
    [J]. CYBERNETICS AND SYSTEMS, 2024, 55 (07) : 1614 - 1630