Capture interspeaker information with a neural network for speaker identification

被引:11
|
作者
Wang, L [1 ]
Chen, K
Chi, HS
机构
[1] Univ Cambridge, Dept Engn, Speech Vis & Robotocs Grp, Cambridge CA2 1PZ, England
[2] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
[3] Peking Univ, Natl Lab Machine Percept, Beijing 100871, Peoples R China
[4] Peking Univ, Ctr Informat Sci, Beijing 100871, Peoples R China
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2002年 / 13卷 / 02期
基金
中国国家自然科学基金;
关键词
interspeaker information; KING speech corpus; model-based method; neural networks; query-based learning algorithm; rival penalized encoding scheme; speaker identification;
D O I
10.1109/72.991429
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based approach is one of methods widely used for speaker identification, where a statistical model is used to characterize a specific speaker's voice but no interspeaker information is involved in its parameter estimation. It is observed that interspeaker information is very helpful in discriminating between different speakers. In this paper, we propose a novel method for the use of interspeaker information to improve performance of a model-based speaker identification system. A neural network is employed to capture the interspeaker information from the output space of those statistical models. In order to sufficiently utilize interspeaker information, a rival penalized encoding rule is proposed to design supervised learning pairs. For better generalization, moreover, a query-based learning algorithm is presented to actively select the input data of interest during training of the neural network. Comparative results on the KING speech corpus show that our method leads to a considerable improvement for a model-based speaker identification system.
引用
收藏
页码:436 / 445
页数:10
相关论文
共 50 条
  • [31] Binary Neural Network for Speaker Verification
    Zhu, Tinglong
    Qin, Xiaoyi
    Li, Ming
    INTERSPEECH 2021, 2021, : 86 - 90
  • [32] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
    Zajic, Zbynek
    Zelinka, Jan
    Mueller, Ludek
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
  • [33] DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION
    Tang, Yun
    Mohan, Aanchan
    Rose, Richard C.
    Ma, Chengyuan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [34] Text-independent speaker identification using a hybrid neural network and conformity approach
    Ouzounov, A
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 2098 - 2102
  • [35] Speaker Identification Using Linear Predictive Cepstral Coefficients And General Regression Neural Network
    Li, Penghua
    Hu, Fangchao
    Li, Yinguo
    Xu, Yang
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 4952 - 4956
  • [36] A Novel Convolutional Neural Network Model for Automatic Speaker Identification From Speech Signals
    Pandian, J. Arun
    Thirunavukarasu, Ramkumar
    Kotei, Evans
    IEEE ACCESS, 2024, 12 : 51381 - 51394
  • [37] Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network
    Jahangir, Rashid
    TEh, Ying Wah
    Memon, Nisar Ahmed
    Mujtaba, Ghulam
    Zareei, Mahdi
    Ishtiaq, Uzair
    Akhtar, Muhammad Zaheer
    Ali, Ihsan
    IEEE ACCESS, 2020, 8 : 32187 - 32202
  • [38] LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 8 - 15
  • [39] An Information Theoretic Approach to Neural Network Based System Identification
    Chernyshov, Kirill R.
    SIBCON-2009: INTERNATIONAL SIBERIAN CONFERENCE ON CONTROL AND COMMUNICATIONS, 2009, : 100 - 107
  • [40] DEEP NEURAL NETWORKS FOR COCHANNEL SPEAKER IDENTIFICATION
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4824 - 4828