Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引:0
|
作者
Fayan R. [1 ]
Montajabi Z. [2 ]
Gonsalves R. [1 ]
机构
[1] Avid, United States
[2] Avid Technology, United States
来源
SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期
关键词
ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;
D O I
10.5594/JMI.2024/IPYX8877
中图分类号
学科分类号
摘要
This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.
引用
下载
收藏
页码:48 / 57
页数:9
相关论文
共 50 条
  • [41] Automatic speech recognition systems: A survey of discriminative techniques
    Kaur, Amrit Preet
    Singh, Amitoj
    Sachdeva, Rohit
    Kukreja, Vinay
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (09) : 13307 - 13339
  • [42] AUTOMATIC SPEAKER AUTHENTICATION USING SPEECH RECOGNITION TECHNIQUES
    MEEKER, WF
    MARTIN, TB
    HERSCHER, MB
    PHYFE, D
    WEINSTOCK, M
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (05): : 1182 - &
  • [43] Fusion of speech techniques for automatic environmental sound recognition
    Olteanu, Elena
    Miu, Delia Oana
    Drosu, Alexandru
    Segarceanu, Svetlana
    Suciu, George
    Gavat, Inge
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [44] Applying Nonlinear Techniques for an Automatic Speech Recognition System
    Schiopu, Daniela
    NONLINEAR DYNAMICS OF ELECTRONIC SYSTEMS, 2014, 438 : 371 - 378
  • [45] MACHINE RECOGNITION OF HUMAN LANGUAGE .I. AUTOMATIC SPEECH RECOGNITION
    LINDGREN, N
    IEEE SPECTRUM, 1965, 2 (03) : 114 - +
  • [46] Evaluation of Wains as a Classifier for Automatic Speech Recognition
    Salaja, Rosemary T.
    Flynn, Ronan
    Russell, Michael
    2015 26TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2015,
  • [47] Emotion Recognition System via Facial Expressions and Speech Using Machine Learning and Deep Learning Techniques
    Chaudhari A.
    Bhatt C.
    Nguyen T.T.
    Patel N.
    Chavda K.
    Sarda K.
    SN Computer Science, 4 (4)
  • [48] On the Evaluation of Automatic Program Repair Techniques and Tools
    Khalilian, Alireza
    Baraani-Dastjerdi, Ahmad
    Zamani, Bahman
    2016 24TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2016, : 61 - 66
  • [49] Experimentations on Machine Learning Techniques Towards the Development of an Automatic Nigerian Currency Recognition Model
    Adetiba, Emmanuel
    Okey-Okoro, Daniel
    Abayomi, Abdultaofeek
    Badejo, Joke A.
    Moyo, Sibusiso
    Abolarin, Olusegun G.
    Ajala, Sunday
    VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 13820 - 13829
  • [50] Advances in Contextual Action Recognition: Automatic Cheating Detection Using Machine Learning Techniques
    Hussein, Fairouz
    Al-Ahmad, Ayat
    El-Salhi, Subhieh
    Alshdaifat, Esra'a
    Al-Hami, Mo'taz
    DATA, 2022, 7 (09)