Real-Time Integration of Dynamic Context Information for Improving Automatic Speech Recognition

被引:0
|
作者
Oualil, Youssef [1 ]
Schulder, Marc [1 ]
Helmke, Hartmut [2 ]
Schmidt, Anna [1 ]
Klakow, Dietrich [1 ]
机构
[1] Univ Saarland, Spoken Language Syst Grp LSV, Saarbrucken, Germany
[2] German Aerosp Ctr DLR, Inst Flight Guidance, Braunschweig, Germany
关键词
speech recognition; situational context; Levenshtein distance;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The use of prior situational/contextual knowledge about a given task can significantly improve Automatic Speech Recognition (ASR) performance. This is typically done through adaptation of acoustic or language models if data is available, or using knowledge-based rescoring. The main adaptation techniques, however, are either domain-specific, which makes them inadequate for other tasks, or static and offline, and therefore cannot deal with dynamic knowledge. To circumvent this problem, we propose a real-time system which dynamically integrates situational context into ASR. The context integration is done either post-recognition, in which case a weighted Levenshtein distance between the ASR hypotheses and the context information, based on the ASR confidence scores, is proposed to extract the most likely sequence of spoken words;, or pre-recognition, where the search space is adjusted to the new situational knowledge through adaptation of the finite state machine modeling the spoken language. Experiments conducted on 3 hours of Air Traffic Control (ATC) data achieved a reduction of the Command Error Rate (CmdER), which is used as evaluation metric in the ATC domain, by a factor of 4 compared to using no contextual knowledge.
引用
收藏
页码:2107 / 2111
页数:5
相关论文
共 50 条
  • [1] Real-time Prototype for Integration of Blind Source Extraction and Robust Automatic Speech Recognition
    Nesta, Francesco
    Matassoni, Marco
    Maganti, HariKrishna
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3350 - 3351
  • [2] Lightweight Real-Time Recurrent Models for Speech Enhancement and Automatic Speech Recognition
    Dhahbi, Sami
    Saleem, Nasir
    Gunawan, Teddy Surya
    Bourouis, Sami
    Ali, Imad
    Trigui, Aymen
    Algarni, Abeer D.
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2024, 8 (06):
  • [3] REAL-TIME SPEECH RECOGNITION
    CAELEN, J
    CASTAN, S
    PERENNOU, G
    [J]. AUTOMATISME, 1972, 17 (03): : 87 - &
  • [4] On-the-fly Lattice Rescoring for Real-time Automatic Speech Recognition
    Sak, Hasim
    Saraclar, Murat
    Gungor, Tunga
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2450 - +
  • [5] AUTOMATIC SPEECH RECOGNITION FOR REAL TIME SYSTEMS
    Singh, Ranjodh
    Yadav, Hemant
    Sharma, Mohit
    Gosain, Sandeep
    Shah, Rajiv Ratn
    [J]. 2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2019), 2019, : 189 - 198
  • [6] The Recognition of Whispered Speech in Real-Time
    Hendrickson, Kristi
    Ernest, Danielle
    [J]. EAR AND HEARING, 2022, 43 (02): : 554 - 562
  • [7] Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects
    Yadava, G. Thimmaraja
    Nagaraja, B. G.
    Raghudathesh, G. P.
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2024, 134 (01) : 209 - 223
  • [8] REAL-TIME IMPLEMENTATION AND EVALUATION OF ACOUSTIC PROCESSING TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
    ARRIOLA, Y
    CARRASCO, RA
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 1991, 15 (10) : 515 - 530
  • [9] Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects
    G. Thimmaraja Yadava
    B. G. Nagaraja
    G. P. Raghudathesh
    [J]. Wireless Personal Communications, 2024, 134 : 209 - 223
  • [10] Improving Real-time Recognition of Morphologically Rich Speech with Transformer Language Model
    Tarjan, Balazs
    Szaszak, Gyorgy
    Fegyo, Tibor
    Mihajlik, Peter
    [J]. 2020 11TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2020), 2020, : 491 - 495