Integrated End-to-End Automatic Speech Recognition for Languages for Agglutinative Languages

被引:0
|
作者
Bekarystankyzy, Akbayan [1 ,2 ]
Mamyrbayev, Orken [3 ]
Anarbekova, Tolganay [2 ]
机构
[1] Satbayev Univ, Alma Ata, Kazakhstan
[2] Narxoz Univ, Alma Ata, Kazakhstan
[3] Inst Informat & Computat Technol, Alma Ata, Kazakhstan
关键词
Language model; data corpus; scarcity of resources; system learning;
D O I
10.1145/3663568
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The relevance of the problem of automatic speech recognition lies in the lack of research for low-resource languages, stemming from limited training data and the necessity for new technologies to enhance efficiency and performance. The purpose of this work was to study the main aspects of integrated end-to-end speech recognition and the use of modern technologies in the natural processing of agglutinative languages, including Kazakh. In this article, the study of language models was carried out using comparative, graphic, statistical, and analytical-synthetic methods, which were used in combination. This article addresses automatic speech recognition (ASR) in agglutinative languages, particularly Kazakh, through a unified neural network model that integrates both acoustic and language modeling. Employing advanced techniques like connectionist temporal classification and attention mechanisms, the study focuses on effective speech-to-text transcription for languages with complex morphologies. Transfer learning from high-resource languages helps mitigate data scarcity in languages such as Kazakh, Kyrgyz, Uzbek, Turkish, and Azerbaijani. The research assesses model performance, underscores ASR challenges, and proposes advancements for these languages. It includes a com-parative analysis of phonetic and word-formation features in agglutinative Turkic languages, using statistical data. The findings aid further research in linguistics and technology for enhancing speech recognition and synthesis, contributing to voice identification and automation processes.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] End-to-End Speech Recognition in Agglutinative Languages
    Mamyrbayev, Orken
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Turdalykyzy, Tolganay
    Gusmanova, Farida
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
  • [2] End-to-end Speech Recognition for Languages with Ideographic Characters
    Ito, Hitoshi
    Hagiwara, Aiko
    Ichiki, Manon
    Mishima, Takeshi
    Sato, Shoei
    Kobayashi, Akio
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1269 - 1273
  • [3] END-TO-END MULTILINGUAL AUTOMATIC SPEECH RECOGNITION FOR LESS-RESOURCED LANGUAGES: THE CASE OF FOUR ETHIOPIAN LANGUAGES
    Abate, Solomon Teferra
    Tachbelie, Martha Yifiru
    Schultz, Tanja
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7013 - 7017
  • [4] Efficiency of End-to-End Speech Recognition for Languages with Scarce Resources
    Rudzionis, Vytautas
    Malukas, Ugnius
    Lopata, Audrius
    [J]. INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2022, 2022, 1665 : 259 - 264
  • [5] END-TO-END SPEECH RECOGNITION AND KEYWORD SEARCH ON LOW-RESOURCE LANGUAGES
    Rosenberg, Andrew
    Audhkhasi, Kartik
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Picheny, Michael
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5280 - 5284
  • [6] Subword Speech Recognition for Agglutinative Languages
    Valizada, Alakbar
    [J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [7] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (08):
  • [8] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [9] Recent Advances in End-to-End Automatic Speech Recognition
    Li, Jinyu
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [10] Inverted Alignments for End-to-End Automatic Speech Recognition
    Doetsch, Patrick
    Hannemann, Mirko
    Schluter, Ralf
    Ney, Hermann
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273