Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引:0
|
作者
Fayan R. [1 ]
Montajabi Z. [2 ]
Gonsalves R. [1 ]
机构
[1] Avid, United States
[2] Avid Technology, United States
来源
SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期
关键词
ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;
D O I
10.5594/JMI.2024/IPYX8877
中图分类号
学科分类号
摘要
This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.
引用
收藏
页码:48 / 57
页数:9
相关论文
共 50 条
  • [1] Machine Learning in Automatic Speech Recognition: A Survey
    Padmanabhan, Jayashree
    Premkumar, Melvin Jose Johnson
    [J]. IETE TECHNICAL REVIEW, 2015, 32 (04) : 240 - 251
  • [2] Applying Machine Learning Techniques for Speech Emotion Recognition
    Tarunika, K.
    Pradeeba, R. B.
    Aruna, P.
    [J]. 2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [3] Automatic Speech Recognition: A survey of deep learning techniques and approaches
    Ahlawat, Harsh
    Aggarwal, Naveen
    Gupta, Deepti
    [J]. International Journal of Cognitive Computing in Engineering, 2025, 6 : 201 - 237
  • [4] AUTOMATIC EVALUATION OF ENGLISH PRONUNCIATION BASED ON SPEECH RECOGNITION TECHNIQUES
    HAMADA, H
    MIKI, S
    NAKATSU, R
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (03) : 352 - 359
  • [5] Speech emotion recognition of Hindi speech using statistical and machine learning techniques
    Agrawal, Akshat
    Jain, Anurag
    [J]. JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2020, 23 (01) : 311 - 319
  • [6] Evaluation of Machine Learning Algorithms for Automatic Modulation Recognition
    Hazar, Muhammed Abdurrahman
    Odabasioglu, Niyazi
    Ensari, Tolga
    Kavurucu, Yusuf
    [J]. NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 208 - 215
  • [7] Automatic Speech Recognition Errors Detection Using Supervised Learning Techniques
    Errattahi, Rahhal
    El Hannani, Asmaa
    Ouahmane, Hassan
    Hain, Thomas
    [J]. 2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [8] The Automatic Recognition of Sepedi Speech Emotions based on Machine Learning Algorithms
    Manamela, Phuti J.
    Manamela, Madimetja J.
    Modipa, Thipe I.
    Sefara, Tshepisho J.
    Mokgonyane, Tumisho B.
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS (ICABCD), 2018,
  • [9] A Machine Learning Based System for the Automatic Evaluation of Aphasia Speech
    Kohlschein, Christian
    Schmitt, Maximilian
    Schuller, Bjoern
    Jeschke, Sabina
    Werner, Cornelius J.
    [J]. 2017 IEEE 19TH INTERNATIONAL CONFERENCE ON E-HEALTH NETWORKING, APPLICATIONS AND SERVICES (HEALTHCOM), 2017,
  • [10] Evaluation of Smoothing Techniques for Language Modeling in Automatic Filipino Speech Recognition
    Ang, Federico M.
    Ancheta, Juan Carlo Miguel C.
    Francia, Karmela Mariz F.
    Chua, Krisel G.
    [J]. TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY, 2012,