Ensemble System of Deep Neural Networks for Single-Channel Audio Separation

被引:2
|
作者
Al-Kaltakchi, Musab T. S. [1 ]
Mohammad, Ahmad Saeed [2 ]
Woo, Wai Lok [3 ]
机构
[1] Mustansiriyah Univ, Coll Engn, Dept Elect Engn, Baghdad, Iraq
[2] Mustansiriyah Univ, Coll Engn, Dept Comp Engn, Baghdad, Iraq
[3] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
关键词
single-channel audio separation; deep neural networks; ideal binary mask; feature fusion; EXTREME LEARNING-MACHINE; NONNEGATIVE MATRIX FACTORIZATION; SPEECH SEPARATION; ALGORITHM;
D O I
10.3390/info14070352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech separation is a well-known problem, especially when there is only one sound mixture available. Estimating the Ideal Binary Mask (IBM) is one solution to this problem. Recent research has focused on the supervised classification approach. The challenge of extracting features from the sources is critical for this method. Speech separation has been accomplished by using a variety of feature extraction models. The majority of them, however, are concentrated on a single feature. The complementary nature of various features have not been thoroughly investigated. In this paper, we propose a deep neural network (DNN) ensemble architecture to completely explore the complimentary nature of the diverse features obtained from raw acoustic features. We examined the penultimate discriminative representations instead of employing the features acquired from the output layer. The learned representations were also fused to produce a new features vector, which was then classified by using the Extreme Learning Machine (ELM). In addition, a genetic algorithm (GA) was created to optimize the parameters globally. The results of the experiments showed that our proposed system completely considered various features and produced a high-quality IBM under different conditions.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks
    Grais, Emad M.
    Roma, Gerard
    Simpson, Andrew J. R.
    Plumbley, Mark D.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (09) : 1469 - 1479
  • [2] Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks
    Grais, Emad M.
    Roma, Gerard
    Simpson, Andrew J. R.
    Plumbley, Mark D.
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 : 236 - 246
  • [3] BITWISE NEURAL NETWORKS FOR EFFICIENT SINGLE-CHANNEL SOURCE SEPARATION
    Kim, Minje
    Smaragdis, Paris
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 701 - 705
  • [4] Combining Mask Estimates for Single Channel Audio Source Separation using Deep Neural Networks
    Grais, Emad M.
    Roma, Gerard
    Simpson, Andrew J. R.
    Plumbley, Mark D.
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3339 - 3343
  • [5] DEEP NEURAL NETWORKS FOR SINGLE CHANNEL SOURCE SEPARATION
    Grais, Emad M.
    Sen, Mehmet Umut
    Erdogan, Hakan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Discriminatively Trained Recurrent Neural Networks for Single-Channel Speech Separation
    Weninger, Felix
    Hershey, John R.
    Le Roux, Jonathan
    Schuller, Bjoern
    [J]. 2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 577 - 581
  • [7] INCREMENTAL BINARIZATION ON RECURRENT NEURAL NETWORKS FOR SINGLE-CHANNEL SOURCE SEPARATION
    Kim, Sunwoo
    Maity, Mrinmoy
    Kim, Minje
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 376 - 380
  • [8] A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks
    Wang, Yannan
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1535 - 1546
  • [9] A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks
    Du, Jun
    Tu, Yanhui
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (08) : 1424 - 1437
  • [10] Perceptual Weighting Deep Neural Networks for Single-channel Speech Enhancement
    Han, Wei
    Zhang, Xiongwei
    Min, Gang
    Zhou, Xingyu
    Zhang, Wei
    [J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 446 - 450