Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music

被引:7
|
作者
Khonglah, Banriskhem K. [1 ]
Dey, Abhishek [2 ]
Prasanna, S. R. Mahadeva [3 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India
[2] Gauhati Univ Inst Sci & Technol, Dept Elect & Commun Engn, Gauhati 781014, India
[3] Indian Inst Technol Dharwad, Dept Elect Engn, Dharwad 580011, Karnataka, India
关键词
Source information; Single frequency filter; Zero frequency filter; Temporal enhancement; Spectral enhancement; EPOCH EXTRACTION; NOISE; SUPPRESSION;
D O I
10.1007/s00034-018-0873-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This work explores the significance of source information for speech enhancement resulting in better phoneme recognition of speech with background music segments. Standard procedure for speech enhancement in noisy conditions involves sequential processing in terms of the temporal, spectral and perceptual methods. This work follows the same sequential processing but with the additional modification of studying the effect of source, particularly in the temporal and perceptual-based enhancement techniques for enhancing speech with background music segments. The source information is studied in terms of the epoch locations and epoch strength, obtained after passing the sum of the mean and standard deviation of the component envelopes computed across frequencies obtained using the single frequency filter (SFF), through a zero frequency filter (ZFF). This method of obtaining epoch locations and epoch strength will be termed as SFF-ZFF in this work. The enhanced segments are passed through a phoneme recognizer built using Gaussian mixture model-hidden Markov model (GMM-HMM), subspace Gaussian mixture model-hidden Markov model (SGMM-HMM) and deep neural network-hidden Markov model (DNN-HMM) system, where the models are trained on clean speech. The enhanced audio files show a better phone error rate than the degraded audio files, which means that performing enhancement in terms of the source information is significant for the speech with background music regions.
引用
收藏
页码:643 / 663
页数:21
相关论文
共 50 条
  • [1] Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music
    Banriskhem K. Khonglah
    Abhishek Dey
    S. R. Mahadeva Prasanna
    [J]. Circuits, Systems, and Signal Processing, 2019, 38 : 643 - 663
  • [2] Speech enhancement using excitation source information
    Yegnanarayana, B
    Prasanna, SRM
    Rao, KS
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 541 - 544
  • [3] Combined Speech Decoders Output For Phoneme Recognition Enhancement
    Abida, Kacem
    Karray, Falchri
    Abida, Wafa
    [J]. 2009 3RD INTERNATIONAL CONFERENCE ON SIGNALS, CIRCUITS AND SYSTEMS (SCS 2009), 2009, : 146 - +
  • [4] Phoneme recognition using speech image (spectrogram)
    Ahmadi, M
    Bailey, NJ
    Hoyle, BS
    [J]. ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 675 - 677
  • [5] The effects of background music on speech recognition accuracy
    Raj, B
    Parikh, VN
    Stern, RM
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 851 - 854
  • [6] ROBUST AUTOMATIC RECOGNITION OF SPEECH WITH BACKGROUND MUSIC
    Malek, Jiri
    Zdansky, Jindrich
    Cerva, Petr
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5210 - 5214
  • [7] PHONEME GROUPING FOR SPEECH RECOGNITION
    REDDY, DR
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 41 (05): : 1295 - &
  • [8] Speech Recognition Using Blind Source Separation and Dereverberation Method for Mixed Sound of Speech and Music
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [9] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
    Yenigalla, Promod
    Kumar, Abhay
    Tripathi, Suraj
    Singh, Chirag
    Kar, Sibsambhu
    Vepa, Jithendra
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3688 - 3692
  • [10] Robust distributed speech recognition using speech enhancement
    Flynn, Ronan
    Jones, Edward
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273