Sparse DNN-based speaker segmentation using side information

被引:1
|
作者
Ma, Yong [1 ,2 ]
Bao, Chang-Chun [1 ]
机构
[1] Beijing Univ Technol, Speech & Audio Signal Proc Lab, Sch Elect Informat & Control Engn, Beijing 100124, Peoples R China
[2] Jiangsu Normal Univ, Sch Phys & Elect Engn, Xuzhou, Peoples R China
关键词
D O I
10.1049/el.2015.0298
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sparse deep neural networks (SDNNs) for speaker segmentation are proposed. First, the SDNNs are trained using the side information that is the class label of the input. Then, speaker-specific features are extracted from the super-vector feature of the speech signal by the SDNNs. Lastly, the label of each speech frame is obtained by K-means clustering, which is used to segment different speakers of a continuous speech stream. The performance evaluation using the multi-speaker speech stream corpus generated from the TIMIT database shows that the proposed speaker segmentation algorithm outperforms the Bayesian information criterion method and the deep auto-encoder networks method.
引用
收藏
页码:651 / 653
页数:2
相关论文
共 50 条
  • [1] DNN-based speaker clustering for speaker diarisation
    Milner, Rosanna
    Hain, Thomas
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
  • [2] DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
  • [3] An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2278 - 2282
  • [4] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
  • [5] SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5540 - 5544
  • [6] UNSUPERVISED SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5135 - 5139
  • [7] On the Issue of Calibration in DNN-based Speaker Recognition Systems
    McLaren, Mitchell
    Castan, Diego
    Ferrer, Luciana
    Lawson, Aaron
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829
  • [8] DNN-based Models for Speaker Age and Gender Classification
    Qawaqneh, Zakariya
    Abu Mallouh, Arafat
    Barkana, Buket D.
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2017, : 106 - 111
  • [9] A study of speaker adaptation for DNN-based speech synthesis
    Wu, Zhizheng
    Swietojanski, Pawel
    Veaux, Christophe
    Renals, Steve
    King, Simon
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883
  • [10] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658