Sparse DNN-based speaker segmentation using side information

被引:1
|
作者
Ma, Yong [1 ,2 ]
Bao, Chang-Chun [1 ]
机构
[1] Beijing Univ Technol, Speech & Audio Signal Proc Lab, Sch Elect Informat & Control Engn, Beijing 100124, Peoples R China
[2] Jiangsu Normal Univ, Sch Phys & Elect Engn, Xuzhou, Peoples R China
关键词
Bayesian networks - Information use - Image segmentation - Speech;
D O I
10.1049/el.2015.0298
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sparse deep neural networks (SDNNs) for speaker segmentation are proposed. First, the SDNNs are trained using the side information that is the class label of the input. Then, speaker-specific features are extracted from the super-vector feature of the speech signal by the SDNNs. Lastly, the label of each speech frame is obtained by K-means clustering, which is used to segment different speakers of a continuous speech stream. The performance evaluation using the multi-speaker speech stream corpus generated from the TIMIT database shows that the proposed speaker segmentation algorithm outperforms the Bayesian information criterion method and the deep auto-encoder networks method.
引用
收藏
页码:651 / 653
页数:2
相关论文
共 50 条
  • [11] Speaker adaptation in DNN-based speech synthesis using d-vectors
    Doddipatla, Rama
    Braunschweiler, Norbert
    Maia, Ranniery
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3404 - 3408
  • [12] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [13] ENVIRONMENT AWARE SPEAKER DIARIZATION FOR MOVING TARGETS USING PARALLEL DNN-BASED RECOGNIZERS
    Najafian, Maryam
    Hansen, John H. L.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5450 - 5454
  • [14] Speaker verification using short utterances with DNN-based estimation of subglottal acoustic features
    Guo, Jinxi
    Yeung, Gary
    Muralidharan, Deepak
    Arsikere, Harish
    Afshan, Amber
    Alwan, Abeer
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2219 - 2222
  • [15] A DNN-based semantic segmentation for detecting weed and crop
    You, Jie
    Liu, Wei
    Lee, Joonwhoan
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2020, 178
  • [16] Speaker Extraction using LCMV Beamformer with DNN-based SPP and RTF Identification Scheme
    Malek, Ariel
    Chazan, Shlomo E.
    Malka, Ilan
    Tourbabin, Vladimir
    Goldberger, Jacob
    Tzirkel-Hancock, Eli
    Gannot, Sharon
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2274 - 2278
  • [17] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
    Udagawa, Kenta
    Saito, Yuki
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2022, 2022, : 2968 - 2972
  • [18] AN INVESTIGATION OF AUGMENTING SPEAKER REPRESENTATIONS TO IMPROVE SPEAKER NORMALISATION FOR DNN-BASED SPEECH RECOGNITION
    Huang, Hengguan
    Sim, Khe Chai
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4610 - 4613
  • [19] A DNN-Based Learning Framework for Continuous Movements Segmentation
    Xiang, Tian-yu
    Zhou, Xiao-Hu
    Xie, Xiao-Liang
    Liu, Shi-Qi
    Feng, Zhen-Qiu
    Gui, Mei-Jiang
    Li, Hao
    Hou, Zeng-Guang
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2023, PT III, 2024, 14449 : 399 - 410
  • [20] Suitability of DNN-based vessel segmentation for SIRT planning
    Kock, Farina
    Thielke, Felix
    Abolmaali, Nasreddin
    Meine, Hans
    Schenk, Andrea
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 19 (2) : 233 - 240