Single-Channel Speech-Music Separation for Robust ASR With Mixture Models

被引:22
|
作者
Demir, Cemil [1 ,2 ]
Saraclar, Murat [2 ]
Cemgil, Ali Taylan [2 ]
机构
[1] TUBITAK BILGEM, Speech & Language Technol Lab, Kocaeli, Turkey
[2] Bogazici Univ, Dept Elect Engn, TR-34342 Istanbul, Turkey
关键词
Gamma Markov chain; non-negative matrix factorization (NMF); single-channel; speech recognition; speech-music separation; NONNEGATIVE MATRIX FACTORIZATION; AUDIO;
D O I
10.1109/TASL.2012.2231072
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, we describe a mixture model based single-channel speech-music separation method. Given a catalog of background music material, we propose a generative model for the superposed speech and music spectrograms. The background music signal is assumed to be generated by a jingle in the catalog. The background music component is modeled by a scaled conditional mixture model representing the jingle. The speech signal is modeled by a probabilistic model, which is similar to the probabilistic interpretation of Non-negative Matrix Factorization (NMF) model. The parameters of the speech model is estimated in a semi-supervised manner from the mixed signal. The approach is tested with Poisson and complex Gaussian observation models that correspond respectively to Kullback-Leibler (KL) and Itakura-Saito (IS) divergence measures. Our experiments show that the proposed mixture model outperforms a standard NMF method both in speech-music separation and automatic speech recognition (ASR) tasks. These results are further improved using Markovian prior structures for temporal continuity between the jingle frames. Our test results with real data show that our method increases the speech recognition performance.
引用
收藏
页码:725 / 736
页数:12
相关论文
共 50 条
  • [1] Effect of speech priors in single-channel speech-music separation for ASR
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1234 - 1237
  • [2] Catalog-Based Single-Channel Speech-Music Separation
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2786 - +
  • [3] CATALOG-BASED SINGLE-CHANNEL SPEECH-MUSIC SEPARATION FOR AUTOMATIC SPEECH RECOGNITION
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 2133 - 2137
  • [4] Semi-supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 688 - +
  • [5] ANALYSIS OF EFFECT OF SINGLE-CHANNEL SPEECH-MUSIC SEPARATION USING NMF TO AUTOMATIC SPEECH RECOGNITION
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    [J]. 2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1818 - 1821
  • [6] CATALOG-BASED SINGLE-CHANNEL SPEECH-MUSIC SEPARATION WITH THE ITAKURA-SAITO DIVERGENCE
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2812 - 2816
  • [7] Optimum Mixture Estimator for single-channel Speech Separation
    Mowlaee, Pejman
    Sayadiyan, Abolghassem
    Sheikhan, Mansour
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS, VOLS 1 AND 2, 2008, : 543 - +
  • [8] Speech separation from background of music based on single-channel recording
    Jin, Xue-Cheng
    Wang, Zeng-Fu
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 278 - +
  • [9] SINGLE-CHANNEL SPEECH SEPARATION BASED ON ROBUST SPARSE BAYESIAN LEARNING
    Wang, Zhe
    Bi, Guoan
    Li, Xiumei
    [J]. 2017 13TH IEEE INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2017, : 113 - 117
  • [10] A VQ-based Single-Channel Audio Separation for Music/Speech Mixtures
    Asgari, Meysam
    Fallah, Mahdi
    Mehrizi, Elahe Abouie
    Mostafavi, Ali
    [J]. UKSIM 2009: ELEVENTH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION, 2009, : 223 - +