Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引：0

作者：

Fayan R. ^{[1
]}

Montajabi Z. ^{[2
]}

Gonsalves R. ^{[1
]}

机构：

[1] Avid, United States

[2] Avid Technology, United States

来源：

SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期

关键词：

ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;

D O I：

10.5594/JMI.2024/IPYX8877

中图分类号：

学科分类号：

摘要：

This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.

引用

下载

页码：48 / 57

页数：9

共 50 条

[41] Automatic speech recognition systems: A survey of discriminative techniques
Kaur, Amrit Preet
Singh, Amitoj
Sachdeva, Rohit
Kukreja, Vinay
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (09) : 13307 - 13339
[42] AUTOMATIC SPEAKER AUTHENTICATION USING SPEECH RECOGNITION TECHNIQUES
MEEKER, WF
MARTIN, TB
HERSCHER, MB
PHYFE, D
WEINSTOCK, M
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (05): : 1182 - &
[43] Fusion of speech techniques for automatic environmental sound recognition
Olteanu, Elena
Miu, Delia Oana
Drosu, Alexandru
Segarceanu, Svetlana
Suciu, George
Gavat, Inge
2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
[44] Applying Nonlinear Techniques for an Automatic Speech Recognition System
Schiopu, Daniela
NONLINEAR DYNAMICS OF ELECTRONIC SYSTEMS, 2014, 438 : 371 - 378
[45] MACHINE RECOGNITION OF HUMAN LANGUAGE .I. AUTOMATIC SPEECH RECOGNITION
LINDGREN, N
IEEE SPECTRUM, 1965, 2 (03) : 114 - +
[46] Evaluation of Wains as a Classifier for Automatic Speech Recognition
Salaja, Rosemary T.
Flynn, Ronan
Russell, Michael
2015 26TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2015,
[47] Emotion Recognition System via Facial Expressions and Speech Using Machine Learning and Deep Learning Techniques
Chaudhari A.
Bhatt C.
Nguyen T.T.
Patel N.
Chavda K.
Sarda K.
SN Computer Science, 4 (4)
[48] On the Evaluation of Automatic Program Repair Techniques and Tools
Khalilian, Alireza
Baraani-Dastjerdi, Ahmad
Zamani, Bahman
2016 24TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2016, : 61 - 66
[49] Experimentations on Machine Learning Techniques Towards the Development of an Automatic Nigerian Currency Recognition Model
Adetiba, Emmanuel
Okey-Okoro, Daniel
Abayomi, Abdultaofeek
Badejo, Joke A.
Moyo, Sibusiso
Abolarin, Olusegun G.
Ajala, Sunday
VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 13820 - 13829
[50] Advances in Contextual Action Recognition: Automatic Cheating Detection Using Machine Learning Techniques
Hussein, Fairouz
Al-Ahmad, Ayat
El-Salhi, Subhieh
Alshdaifat, Esra'a
Al-Hami, Mo'taz
DATA, 2022, 7 (09)

← 1 2 3 4 5 →