Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引:0
|
作者
Fayan R. [1 ]
Montajabi Z. [2 ]
Gonsalves R. [1 ]
机构
[1] Avid, United States
[2] Avid Technology, United States
来源
SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期
关键词
ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;
D O I
10.5594/JMI.2024/IPYX8877
中图分类号
学科分类号
摘要
This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.
引用
收藏
页码:48 / 57
页数:9
相关论文
共 50 条
  • [31] EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition
    Wang, Chengyu
    Cheng, Mengli
    Hu, Xu
    Huang, Jun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 16111 - 16113
  • [32] A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech
    Dogdu, Cem
    Kessler, Thomas
    Schneider, Dana
    Shadaydeh, Maha
    Schweinberger, Stefan R.
    SENSORS, 2022, 22 (19)
  • [33] Automatic speech recognition of Gujarati digits using wavelet coefficients in machine learning algorithms
    Pandit P.
    Bhatt S.
    International Journal of Innovative Computing and Applications, 2023, 14 (04) : 191 - 200
  • [34] Automatic Machine Learning for Target Recognition
    Blasch, Erik
    Majumder, Uttam K.
    Rovito, Todd
    Zulch, Peter
    Velten, Vince
    AUTOMATIC TARGET RECOGNITION XXIX, 2019, 10988
  • [35] LEARNING WITH SYNTHESIZED SPEECH FOR AUTOMATIC EMOTION RECOGNITION
    Schuller, Bjoern
    Burkhardt, Felix
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5150 - 5153
  • [36] Transfer Learning for Automatic Speech Recognition Systems
    Asefisaray, Behnam
    Haznedaroglu, Ali
    Erden, Mustafa
    Arslan, Levent M.
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [37] REAL-TIME IMPLEMENTATION AND EVALUATION OF ACOUSTIC PROCESSING TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
    ARRIOLA, Y
    CARRASCO, RA
    MICROPROCESSORS AND MICROSYSTEMS, 1991, 15 (10) : 515 - 530
  • [38] Transfer Learning in Automatic Speech Recognition for Serbian
    Popovic, Branislav
    Pakoci, Edvin
    Pekar, Darko
    2019 27TH TELECOMMUNICATIONS FORUM (TELFOR 2019), 2019, : 309 - 312
  • [39] AUTOMATIC SPEECH RECOGNITION IN MACHINE-AIDED TRANSLATION
    BROWN, PF
    CHEN, SF
    DELLAPIETRA, SA
    DELLAPIETRA, VJ
    KEHLER, AS
    MERCER, RL
    COMPUTER SPEECH AND LANGUAGE, 1994, 8 (03): : 177 - 187
  • [40] Overview of speech enhancement techniques for automatic speaker recognition
    OrtegaGarcia, J
    GonzalezRodriguez, J
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 929 - 932