DNN-based Models for Speaker Age and Gender Classification

被引：3

作者：

Qawaqneh, Zakariya ^{[1
]}

Abu Mallouh, Arafat ^{[1
]}

Barkana, Buket D. ^{[2
]}

机构：

[1] Univ Bridgeport, Comp Sci & Engn Dept, Bridgeport, CT 06604 USA

[2] Univ Bridgeport, Elect Engn Dept, Bridgeport, CT 06604 USA

来源：

PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS | 2017年

关键词：

Deep Neural Network; SDC; MFCCS; Speaker Age and Gender Classification; RECOGNITION;

D O I：

10.5220/0006096401060111

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Automatic speaker age and gender classification is an active research field due to the continuous and rapid development of applications related to humans' life and health. In this paper, we propose a new method for speaker age and gender classification, which utilizes deep neural networks (DNNs) as feature extractor and classifier. The proposed method creates a model for each speaker. For each test speech utterance, the similarity between the test model and the speaker class models are compared. Two feature sets have been used: Melfrequency cepstral coefficients (MFCCs) and shifted delta cepstral (SDC) coefficients. The proposed model by using the SDC feature set achieved better classification results than that of MFCCs. The experimental results showed that the proposed SDC speaker model + SDC class model outperformed all the other systems by achieving 57.21% overall classification accuracy.

引用

页码：106 / 111

页数：6

共 50 条

[1] DNN-based speaker clustering for speaker diarisation
Milner, Rosanna
Hain, Thomas
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
[2] A DNN-based emotional speech synthesis by speaker adaptation
Yang, Hongwu
Zhang, Weizhao
Zhi, Pengpeng
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
[3] SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5540 - 5544
[4] UNSUPERVISED SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5135 - 5139
[5] On the Issue of Calibration in DNN-based Speaker Recognition Systems
McLaren, Mitchell
Castan, Diego
Ferrer, Luciana
Lawson, Aaron
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829
[6] DNN-Based Speech Synthesis Using Speaker Codes
Hojo, Nobukatsu
Ijima, Yusuke
Mizuno, Hideyuki
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
[7] A study of speaker adaptation for DNN-based speech synthesis
Wu, Zhizheng
Swietojanski, Pawel
Veaux, Christophe
Renals, Steve
King, Simon
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883
[8] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
[9] An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
Hojo, Nobukatsu
Ijima, Yusuke
Mizuno, Hideyuki
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2278 - 2282
[10] Sparse DNN-based speaker segmentation using side information
Ma, Yong
Bao, Chang-Chun
[J]. ELECTRONICS LETTERS, 2015, 51 (08) : 651 - 653

← 1 2 3 4 5 →