Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引:0
|
作者
Fayan R. [1 ]
Montajabi Z. [2 ]
Gonsalves R. [1 ]
机构
[1] Avid, United States
[2] Avid Technology, United States
来源
SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期
关键词
ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;
D O I
10.5594/JMI.2024/IPYX8877
中图分类号
学科分类号
摘要
This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.
引用
下载
收藏
页码:48 / 57
页数:9
相关论文
共 50 条
  • [21] Evaluation of Different Machine Learning and Deep Learning Techniques for Hate Speech Detection
    Shawkat, Nabil
    Saquer, Jamil
    Shatnawi, Hazim
    PROCEEDINGS OF THE 2024 ACM SOUTHEAST CONFERENCE, ACMSE 2024, 2024, : 253 - 258
  • [22] Improving analysis techniques for automatic speech recognition
    O'Shaughnessy, D
    2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL III, CONFERENCE PROCEEDINGS, 2002, : 65 - 68
  • [23] Automatic recognition and evaluation of tracheoesophageal speech
    Haderlein, T
    Steidl, S
    Nöth, E
    Rosanowski, F
    Schuster, M
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 331 - 338
  • [24] Performance Evaluation of Machine Learning Based Face Recognition Techniques
    Sharma, Sahil
    Kumar, Vijay
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 118 (04) : 3403 - 3433
  • [25] Performance Evaluation of Machine Learning Based Face Recognition Techniques
    Sahil Sharma
    Vijay Kumar
    Wireless Personal Communications, 2021, 118 : 3403 - 3433
  • [26] Recognition of Human Actions through Speech or Voice Using Machine Learning Techniques
    Pena-Caceres, Oscar
    Silva-Marchan, Henry
    Albert, Manuela
    Gil, Miriam
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (02): : 1873 - 1891
  • [27] Machine learning techniques for speech emotion recognition using paralinguistic acoustic features
    Jha T.
    Kavya R.
    Christopher J.
    Arunachalam V.
    International Journal of Speech Technology, 2022, 25 (03): : 707 - 725
  • [28] Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech
    Calvo, Irene
    Tropea, Peppino
    Vigano, Mauro
    Scialla, Maria
    Cavalcante, Agnieszka B.
    Grajzer, Monika
    Gilardone, Marco
    Corbo, Massimo
    FOLIA PHONIATRICA ET LOGOPAEDICA, 2021, 73 (05) : 432 - 441
  • [29] Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti
    Gordienko, Nikita
    Gang, Peng
    Gordienko, Yuri
    Zeng, Wei
    Alienin, Oleg
    Rokovyi, Oleksandr
    Stirenko, Sergii
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 414 - 424
  • [30] Temporary recognition for Italian combining techniques of machine learning and automatic acquisition of knowledge
    Saquete, Estela
    Ferrandez, Oscar
    Martinez-Barco, Patricio
    Munoz, Rafael
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2006, (37): : 161 - 168