Speaker Recognition by Machines and Humans

被引:409
|
作者
Hansen, John H. L. [1 ]
Hasan, Taufiq [2 ]
机构
[1] Georgia Inst Technol, Elect Engn, Atlanta, GA 30332 USA
[2] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Dept Elect Engn, Richardson, TX 75083 USA
基金
美国国家科学基金会;
关键词
JOINT FACTOR-ANALYSIS; HUMAN VOICE; MAXIMUM-LIKELIHOOD; VERIFICATION; SPEECH; IDENTIFICATION; COMPENSATION; MODELS; PARAMETERS; STRESS;
D O I
10.1109/MSP.2015.2462851
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition-with ever-improving performance-to the point where automatic systems start to perform on par with human listeners. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. Human speaker recognition is discussed in two parts-the first part involves forensic speaker-recognition methods, and the second illustrates how a naive listener performs this task from a neuroscience perspective. We conclude this review with a comparative study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each.
引用
收藏
页码:74 / 99
页数:26
相关论文
共 50 条
  • [1] SPEAKER RECOGNITION BY HUMANS
    CLARKE, FR
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1965, 37 (06): : 1211 - &
  • [2] Emotional Speaker Identification by Humans and Machines
    Yang, Yingchun
    Chen, Li
    Wang, Wenyi
    [J]. BIOMETRIC RECOGNITION: CCBR 2011, 2011, 7098 : 167 - 173
  • [3] Speech recognition by machines and humans
    Lippmann, RP
    [J]. SPEECH COMMUNICATION, 1997, 22 (01) : 1 - 15
  • [4] Phonetic speaker recognition with support vector machines
    Campbell, WM
    Campbell, JP
    Reynolds, DA
    Jones, DA
    Leek, TR
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 1377 - 1384
  • [5] Support vector machines for speaker and language recognition
    Campbell, WM
    Campbell, JP
    Reynolds, DA
    Singer, E
    Torres-Carrasquillo, PA
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3): : 210 - 229
  • [6] Speaker discrimination in humans and machines: Effects of speaking style variability
    Afshan, Amber
    Kreiman, Jody
    Alwan, Abeer
    [J]. INTERSPEECH 2020, 2020, : 3136 - 3140
  • [7] TARGET AND NON-TARGET SPEAKER DISCRIMINATION BY HUMANS AND MACHINES
    Park, Soo Jin
    Afshan, Amber
    Kreiman, Jody
    Yeung, Gary
    Alwan, Abeer
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6326 - 6330
  • [8] Restricted Boltzmann machines for vector representation of speech in speaker recognition
    Ghahabi, Omid
    Hernando, Javier
    [J]. COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 16 - 29
  • [9] Speaker recognition using continuous density support vector machines
    Xin, D
    Wu, ZH
    [J]. ELECTRONICS LETTERS, 2001, 37 (17) : 1099 - 1101
  • [10] Support Vector Machines with the Priorities Method for Speaker Independent Phoneme Recognition
    Cutajar, M.
    Gatt, E.
    Grech, I
    Casha, O.
    Micallef, J.
    [J]. 2011 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2011, : 409 - 414