Front-end for Far-field Speech Recognition based on Frequency Domain Linear Prediction

被引:0
|
作者
Ganapathy, Sriram [1 ,2 ]
Thomas, Samuel [1 ,2 ]
Hermansky, Hynek [1 ,2 ]
机构
[1] IDIAP Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
关键词
Frequency Domain Linear Prediction; Front-end for Far-field Speech; Reverberant Speech; Speech Recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic Speech Recognition (ASR) systems usually fail when they encounter speech from far-field microphone in reverberant environments. This is due to the application of short-term feature extraction techniques which do not compensate for the artifacts introduced by long room impulse responses. In this paper, we propose a front-end, based on Frequency Domain Linear Prediction (FDLP), that tries to remove reverberation artifacts present in far-field speech. Long temporal segments of far-field speech are analyzed in narrow frequency sub-bands to extract FDLP envelopes and residual signals. Filtering the residual signals with gain normalized inverse FDLP filters result in a set of sub-band signals which are synthesized to reconstruct the signal back. ASR experiments on far-field speech data processed by the proposed front-end show significant improvements (relative reduction of 30% in word error rate) compared to other robust feature extraction techniques.
引用
收藏
页码:984 / +
页数:2
相关论文
共 50 条
  • [21] A Front-End Technique for Automatic Noisy Speech Recognition
    Naing, Hay Mar Soe
    Hidayat, Risanuri
    Hartanto, Rudy
    Miyanaga, Yoshikazu
    [J]. PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 49 - 54
  • [22] Automatic Speech Recognition with a Cochlear Implant Front-End
    Nogueira, Waldo
    Harczos, Tamas
    Edler, Bernd
    Ostermann, Joern
    Buechner, Andreas
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1993 - +
  • [23] Recognition of Reverberant Speech Using Frequency Domain Linear Prediction
    Thomas, Samuel
    Ganapathy, Sriram
    Hermansky, Hynek
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2008, 15 : 681 - 684
  • [24] A Front-End Speech Enhancement System for Robust Automotive Speech Recognition
    Wang, Haikun
    Ye, Zhongfu
    Chen, Jingdong
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 1 - 5
  • [25] Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
    Narayanan, Arun
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 826 - 835
  • [26] Dereverberation of autoregressive envelopes for far-field speech recognition
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Kumar, Rohit
    Ganapathy, Sriram
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [27] Investigation into a Mel subspace based front-end processing for robust speech recognition
    Selouani, SA
    O'Shaughnessy, D
    [J]. Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 187 - 190
  • [28] Feature enhancement for a bitstream-based front-end in wireless speech recognition
    Kim, HK
    Cox, RV
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 241 - 244
  • [29] Performance evaluation of front-end algorithms for robust speech recognition
    Cheng, O
    Abdulla, W
    Salcic, Z
    [J]. ISSPA 2005: THE 8TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 711 - 714
  • [30] Thin client front-end processor for distributed speech recognition
    Chow, KF
    Liew, SC
    Lua, KT
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 29 - 32