English-Filipino Speech Topic Tagger Using Automatic Speech Recognition Modeling and Topic Modeling

被引:0
|
作者
Tumpalan, John Karl B. [1 ]
Recario, Reginald Neil C. [1 ]
机构
[1] Univ Philippines, Los Banos 4031, Philippines
关键词
Automatic Speech Recognition; Topic modeling; Speech tagging; XLSR-Wav2Vec2; Speech recognition for English-Filipino; Latent Dirichlet Allocation; Transfer learning; Audio and speech processing;
D O I
10.1007/978-3-031-28073-3_31
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present an English-Filipino Speech Topic Tagger that transcribes English-Filipino speech audio into text and produces relevant keywords from such audio. The tagger was implemented in two parts by transcribing speech data to text using a Filipino fine-tuned English XLSR-Wav2Vec2 Automatic Speech Recognition (ASR) model then extracting context from the transcription using a generative statistical model used for Topic Modeling, Latent Dirichlet Allocation (LDA). The trained English-Filipino ASR model shows a 26.8% Word Error Rate in the validation set. The Speech Topic Tagger was evaluated through an observation-based approach using different YouTube videos as input and achieved the highest evaluation of 46.15% accuracy and lowest evaluation of 14.28% accuracy in producing relevant top-weighted words.
引用
收藏
页码:427 / 445
页数:19
相关论文
共 50 条
  • [1] CONTINUOUS TOPIC LANGUAGE MODELING FOR SPEECH RECOGNITION
    Chueh, Chuang-Hua
    Chien, Jen-Tzung
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 193 - 196
  • [2] LATENT TOPIC MODELING OF WORD VICINITY INFORMATION FOR SPEECH RECOGNITION
    Chen, Kuan-Yu
    Chiu, Hsuan-Sheng
    Chen, Berlin
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5394 - 5397
  • [3] Evaluation of Smoothing Techniques for Language Modeling in Automatic Filipino Speech Recognition
    Ang, Federico M.
    Ancheta, Juan Carlo Miguel C.
    Francia, Karmela Mariz F.
    Chua, Krisel G.
    [J]. TENCON 2012 - 2012 IEEE REGION 10 CONFERENCE: SUSTAINABLE DEVELOPMENT THROUGH HUMANITARIAN TECHNOLOGY, 2012,
  • [4] SPEECH DISFLUENCIES MODELING IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
    Vasilisa, Verkhodanova O.
    Alexey, Karpov A.
    [J]. TOMSK STATE UNIVERSITY JOURNAL, 2012, (363): : 10 - +
  • [5] FACETED TOPIC RETRIEVAL OF NEWS VIDEO USING JOINT TOPIC MODELING OF VISUAL FEATURES AND SPEECH TRANSCRIPTS
    Wan, Kong-Wah
    Tan, Ah-Hwee
    Lim, Joo-Hwee
    Chia, Liang-Tien
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 843 - 848
  • [6] AUDITORY MODELING FOR AUTOMATIC SPEECH RECOGNITION
    BEET, SW
    MOORE, RK
    TOMLINSON, MJ
    [J]. PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 571 - 579
  • [7] Auditory modeling in automatic recognition of speech
    Hermansky, H
    [J]. SIGNAL ANALYSIS & PREDICTION I, 1997, : 17 - 22
  • [8] Subword Modeling for Automatic Speech Recognition
    Livescu, Karen
    Fosler-Lussier, Eric
    Metze, Florian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 44 - 57
  • [9] STATISTICAL MODELING FOR AUTOMATIC SPEECH RECOGNITION
    MERCER, RL
    [J]. AFIPS CONFERENCE PROCEEDINGS, 1983, 52 : 643 - &
  • [10] Automatic Twitter Topic Summarization With Speech Acts
    Zhang, Renxian
    Li, Wenjie
    Gao, Dehong
    Ouyang, You
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 649 - 658