English-Filipino Speech Topic Tagger Using Automatic Speech Recognition Modeling and Topic Modeling

被引：0

作者：

Tumpalan, John Karl B. ^{[1
]}

Recario, Reginald Neil C. ^{[1
]}

机构：

[1] Univ Philippines, Los Banos 4031, Philippines

来源：

ADVANCES IN INFORMATION AND COMMUNICATION, FICC, VOL 2 | 2023年 / 652卷

关键词：

Automatic Speech Recognition; Topic modeling; Speech tagging; XLSR-Wav2Vec2; Speech recognition for English-Filipino; Latent Dirichlet Allocation; Transfer learning; Audio and speech processing;

D O I：

10.1007/978-3-031-28073-3_31

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present an English-Filipino Speech Topic Tagger that transcribes English-Filipino speech audio into text and produces relevant keywords from such audio. The tagger was implemented in two parts by transcribing speech data to text using a Filipino fine-tuned English XLSR-Wav2Vec2 Automatic Speech Recognition (ASR) model then extracting context from the transcription using a generative statistical model used for Topic Modeling, Latent Dirichlet Allocation (LDA). The trained English-Filipino ASR model shows a 26.8% Word Error Rate in the validation set. The Speech Topic Tagger was evaluated through an observation-based approach using different YouTube videos as input and achieved the highest evaluation of 46.15% accuracy and lowest evaluation of 14.28% accuracy in producing relevant top-weighted words.

引用

页码：427 / 445

页数：19

共 50 条

[1] CONTINUOUS TOPIC LANGUAGE MODELING FOR SPEECH RECOGNITION
Chueh, Chuang-Hua
Chien, Jen-Tzung
[J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 193 - 196
[2] LATENT TOPIC MODELING OF WORD VICINITY INFORMATION FOR SPEECH RECOGNITION
Chen, Kuan-Yu
Chiu, Hsuan-Sheng
Chen, Berlin
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5394 - 5397
[3] Evaluation of Smoothing Techniques for Language Modeling in Automatic Filipino Speech Recognition
Ang, Federico M.
Ancheta, Juan Carlo Miguel C.
Francia, Karmela Mariz F.
Chua, Krisel G.
[J]. TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY, 2012,
[4] SPEECH DISFLUENCIES MODELING IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
Vasilisa, Verkhodanova O.
Alexey, Karpov A.
[J]. TOMSK STATE UNIVERSITY JOURNAL, 2012, (363): : 10 - +
[5] FACETED TOPIC RETRIEVAL OF NEWS VIDEO USING JOINT TOPIC MODELING OF VISUAL FEATURES AND SPEECH TRANSCRIPTS
Wan, Kong-Wah
Tan, Ah-Hwee
Lim, Joo-Hwee
Chia, Liang-Tien
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 843 - 848
[6] AUDITORY MODELING FOR AUTOMATIC SPEECH RECOGNITION
BEET, SW
MOORE, RK
TOMLINSON, MJ
[J]. PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 571 - 579
[7] Auditory modeling in automatic recognition of speech
Hermansky, H
[J]. SIGNAL ANALYSIS & PREDICTION I, 1997, : 17 - 22
[8] Subword Modeling for Automatic Speech Recognition
Livescu, Karen
Fosler-Lussier, Eric
Metze, Florian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 44 - 57
[9] STATISTICAL MODELING FOR AUTOMATIC SPEECH RECOGNITION
MERCER, RL
[J]. AFIPS CONFERENCE PROCEEDINGS, 1983, 52 : 643 - &
[10] Automatic Twitter Topic Summarization With Speech Acts
Zhang, Renxian
Li, Wenjie
Gao, Dehong
Ouyang, You
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 649 - 658

← 1 2 3 4 5 →