Capture interspeaker information with a neural network for speaker identification

被引:11
|
作者
Wang, L [1 ]
Chen, K
Chi, HS
机构
[1] Univ Cambridge, Dept Engn, Speech Vis & Robotocs Grp, Cambridge CA2 1PZ, England
[2] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
[3] Peking Univ, Natl Lab Machine Percept, Beijing 100871, Peoples R China
[4] Peking Univ, Ctr Informat Sci, Beijing 100871, Peoples R China
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2002年 / 13卷 / 02期
基金
中国国家自然科学基金;
关键词
interspeaker information; KING speech corpus; model-based method; neural networks; query-based learning algorithm; rival penalized encoding scheme; speaker identification;
D O I
10.1109/72.991429
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based approach is one of methods widely used for speaker identification, where a statistical model is used to characterize a specific speaker's voice but no interspeaker information is involved in its parameter estimation. It is observed that interspeaker information is very helpful in discriminating between different speakers. In this paper, we propose a novel method for the use of interspeaker information to improve performance of a model-based speaker identification system. A neural network is employed to capture the interspeaker information from the output space of those statistical models. In order to sufficiently utilize interspeaker information, a rival penalized encoding rule is proposed to design supervised learning pairs. For better generalization, moreover, a query-based learning algorithm is presented to actively select the input data of interest during training of the neural network. Comparative results on the KING speech corpus show that our method leads to a considerable improvement for a model-based speaker identification system.
引用
收藏
页码:436 / 445
页数:10
相关论文
共 50 条
  • [41] Neural architectures for gender detection and speaker identification
    Mamyrbayev, Orken
    Toleu, Alymzhan
    Tolegen, Gulmira
    Mekebayev, Nurbapa
    COGENT ENGINEERING, 2020, 7 (01):
  • [42] Speaker identification using Neural Networks on an FPGA
    Trujillo-Romero, F.
    Caballero-Morales, S. O.
    2012 IEEE NINTH ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE (CERMA 2012), 2012, : 197 - 202
  • [43] Neural network models for extracting complementary speaker-specific information from residual phase
    Kodukula, SRM
    Prasanna, SRM
    Yegnanarayana, B
    2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 421 - 425
  • [44] Impact of Prior Channel Information for Speaker Identification
    Vaquero, C.
    Scheffer, N.
    Karajekar, S.
    ADVANCES IN BIOMETRICS, 2009, 5558 : 443 - +
  • [45] Using prosodic and lexical information for speaker identification
    Weber, F
    Manganaro, L
    Peskin, B
    Shriberg, E
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 141 - 144
  • [46] ARTIFICIAL NEURAL NETWORK FEATURES FOR SPEAKER DIARIZATION
    Yella, Harsha
    Stolcke, Andreas
    Slaney, Malcolm
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 402 - 406
  • [47] Convolutional neural network vectors for speaker recognition
    Hourri, Soufiane
    Nikolov, Nikola S.
    Kharroubi, Jamal
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 389 - 400
  • [48] Convolutional neural network vectors for speaker recognition
    Soufiane Hourri
    Nikola S. Nikolov
    Jamal Kharroubi
    International Journal of Speech Technology, 2021, 24 : 389 - 400
  • [49] SEQUENCE SUMMARIZING NEURAL NETWORK FOR SPEAKER ADAPTATION
    Vesely, Karel
    Watanabe, Shinji
    Zmolikova, Katerina
    Karafiat, Martin
    Burget, Lukas
    Cernocky, Jan Honza
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5315 - 5319
  • [50] Speaker Recognition Based on Quantum Neural Network
    Wang, Geng
    Wang, Jin Ming
    Sun, Jian
    2ND INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORK AND MULTIMEDIA TECHNOLOGY (CNMT 2010), VOLS 1 AND 2, 2010, : 238 - 241