DNN-based Models for Speaker Age and Gender Classification

被引:3
|
作者
Qawaqneh, Zakariya [1 ]
Abu Mallouh, Arafat [1 ]
Barkana, Buket D. [2 ]
机构
[1] Univ Bridgeport, Comp Sci & Engn Dept, Bridgeport, CT 06604 USA
[2] Univ Bridgeport, Elect Engn Dept, Bridgeport, CT 06604 USA
关键词
Deep Neural Network; SDC; MFCCS; Speaker Age and Gender Classification; RECOGNITION;
D O I
10.5220/0006096401060111
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Automatic speaker age and gender classification is an active research field due to the continuous and rapid development of applications related to humans' life and health. In this paper, we propose a new method for speaker age and gender classification, which utilizes deep neural networks (DNNs) as feature extractor and classifier. The proposed method creates a model for each speaker. For each test speech utterance, the similarity between the test model and the speaker class models are compared. Two feature sets have been used: Melfrequency cepstral coefficients (MFCCs) and shifted delta cepstral (SDC) coefficients. The proposed model by using the SDC feature set achieved better classification results than that of MFCCs. The experimental results showed that the proposed SDC speaker model + SDC class model outperformed all the other systems by achieving 57.21% overall classification accuracy.
引用
收藏
页码:106 / 111
页数:6
相关论文
共 50 条
  • [1] DNN-based speaker clustering for speaker diarisation
    Milner, Rosanna
    Hain, Thomas
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
  • [2] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
  • [3] SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5540 - 5544
  • [4] UNSUPERVISED SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5135 - 5139
  • [5] On the Issue of Calibration in DNN-based Speaker Recognition Systems
    McLaren, Mitchell
    Castan, Diego
    Ferrer, Luciana
    Lawson, Aaron
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829
  • [6] DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
  • [7] A study of speaker adaptation for DNN-based speech synthesis
    Wu, Zhizheng
    Swietojanski, Pawel
    Veaux, Christophe
    Renals, Steve
    King, Simon
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883
  • [8] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [9] An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2278 - 2282
  • [10] Sparse DNN-based speaker segmentation using side information
    Ma, Yong
    Bao, Chang-Chun
    [J]. ELECTRONICS LETTERS, 2015, 51 (08) : 651 - 653