Transformer-Based Turkish Automatic Speech Recognition

被引：0

作者：

Tasar, Davut Emre ^{[1
]}

Koruyan, Kutan ^{[2
]}

Cilgin, Cihan ^{[3
]}

机构：

[1] Dokuz Eylul Univ, Grad Sch Social Sci, Dept Management Informat Syst, Izmir, Turkiye

[2] Dokuz Eylul Univ, Fac Econ & Adm Sci, Dept Management Informat Syst, Izmir, Turkiye

[3] Bolu Abant Izzet Baysal Univ, Gerede Fac Appl Sci, Dept Management Informat Syst, Bolu, Turkiye

来源：

ACTA INFOLOGICA | 2024年 / 8卷 / 01期

关键词：

Wav2vec2; automatic speech recognition; speech-to-text transcription; natural language processing; transformer architecture;

D O I：

10.26650/acin.1338604

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today, businesses use Automatic Speech Recognition (ASR) technology more frequently to increase efficiency and productivity while performing many business functions. Due to the increased prevalence of online meetings in remote working and learning environments after the COVID-19 pandemic, speech recognition systems have seen more frequent utilization, exhibiting the significance of these systems. While English, Spanish or French languages have a lot of labeled data, there is very little labeled data for the Turkish language. This directly affects the accuracy of the ASR system negatively. Therefore, this study utilizes unlabeled audio data by learning general data representations with self-supervised learning end-to-end modeling. This study employed a transformer-based machine learning model with improved performance through transfer learning to convert speech recordings to text. The model adopted within the scope of the study is the Wav2Vec 2.0 architecture, which masks the audio inputs and solves the related task. The XLSR-Wav2Vec 2.0 model was pretrained on speech data in 53 languages and fine-tuned with the Mozilla Common Voice Turkish data set. According to the empirical results obtained within the scope of the study, a 0.23 word error rate was reached in the test set of the same data set.

引用

页码：1 / 10

页数：10

共 50 条

[1] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
Dong, Fang
Qian, Yiyang
Wang, Tianlei
Liu, Peng
Cao, Jiuwen
[J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
[2] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Zhao, Chendong
Wang, Jianzong
Wei, Wenqi
Qu, Xiaoyang
Wang, Haoqian
Xiao, Jing
[J]. 2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180
[3] A transformer-based network for speech recognition
Tang L.
[J]. International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
[4] Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Lehecka, Jan
Psutka, Josef, V
Psutka, Josef
[J]. TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 301 - 312
[5] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
Hadwan, Mohammed
Alsayadi, Hamzah A.
AL-Hagree, Salah
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
[6] Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Tanaka, Tomohiro
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Moriya, Takafumi
Ashihara, Takanori
Orihashi, Shota
Makishima, Naoki
[J]. INTERSPEECH 2021, 2021, : 4059 - 4063
[7] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
Wang, Yongqiang
Mohamed, Abdelrahman
Le, Duc
Liu, Chunxi
Xiao, Alex
Mahadeokar, Jay
Huang, Hongzhao
Tjandra, Andros
Zhang, Xiaohui
Zhang, Frank
Fuegen, Christian
Zweig, Geoffrey
Seltzer, Michael L.
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
[8] RM-Transformer: A Transformer-based Model for Mandarin Speech Recognition
Lu, Xingyu
Hu, Jianguo
Li, Shenhao
Ding, Yanyu
[J]. 2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 194 - 198
[9] Transformer-based Automatic Speech Recognition of Simultaneous Interpretation with Auxiliary Input of Source Language Text
Taniguchi, Shuta
Kato, Tsuneo
Tamura, Akihiro
Yasuda, Keiji
[J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1857 - 1861
[10] Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems
Viet The Bui
Tho Chi Luong
Oanh Thi Tran
[J]. CYBERNETICS AND SYSTEMS, 2024, 55 (07) : 1614 - 1630

← 1 2 3 4 5 →