Speaker identification features extraction methods: A systematic review

被引:100
|
作者
Tirumala, Sreenivas Sremath [1 ,3 ]
Shahamiri, Seyed Reza [1 ]
Garhwal, Abhimanyu Singh [1 ,3 ]
Wang, Ruili [2 ,4 ]
机构
[1] Manukau Inst Technol, Fac Business & Informat Technol, Auckland, New Zealand
[2] Massey Univ, INMS, Comp Sci & Informat Technol, Auckland, New Zealand
[3] MIT Manukau, Cnr Manukau Stn Rd Davies Ave,Private Bag 94006, Manukau 2241, New Zealand
[4] Massey Univ, Room 3-10,IIMS Bldg,Albany Campus, Auckland, New Zealand
关键词
Feature extraction; Kitchenham systematic review; MFCC; Speaker identification; Speaker recognition; ARTIFICIAL NEURAL-NETWORKS; SPEECH RECOGNITION; MFCC; VERIFICATION; ROBUSTNESS; HISTOGRAM;
D O I
10.1016/j.eswa.2017.08.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker Identification (SI) is the process of identifying the speaker from a given utterance by comparing the voice biometrics of the utterance with those utterance models stored beforehand. SI technologies are taken a new direction due to the advances in artificial intelligence and have been used widely in various domains. Feature extraction is one of the most important aspects of SI, which significantly influences the SI process and performance. This systematic review is conducted to identify, compare, and analyze various feature extraction approaches, methods, and algorithms of SI to provide a reference on feature extraction approaches for SI applications and future studies. The review was conducted according to Kitchenham systematic review methodology and guidelines, and provides an in-depth analysis on proposals and implementations of SI feature extraction methods discussed in the literature between year 2011 and 2106. Three research questions were determined and an initial set of 535 publications were identified to answer the questions. After applying exclusion criteria 160 related publications were shortlisted and reviewed in this paper; these papers were considered to answer the research questions, Results indicate that pure Mel-Frequency Cepstral Coefficients (MFCCs) based feature extraction approaches have been used more than any other approach. Furthermore, other MFCC variations, such as MFCC fusion and cleansing approaches, are proven to be very popular as well. This study identified that the current SI research trend is to develop a robust universal SI framework to address the important problems of SI such as adaptability, complexity, multi-lingual recognition, and noise robustness. The results presented in this research are based on past publications, citations, and number of implementations with citations being most relevant. This paper also presents the general process of SI. (C)2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:250 / 271
页数:22
相关论文
共 50 条
  • [1] Feature Extraction Methods for Speaker Recognition: A Review
    Chaudhary, Gopal
    Srivastava, Smriti
    Bhardwaj, Saurabh
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (12)
  • [2] A network model of speaker identification with new feature extraction methods and BLSTM
    Wang, Xingmei
    Xue, Fuzhao
    Wang, Wei
    Liu, Anhua
    NEUROCOMPUTING, 2020, 403 (403) : 167 - 181
  • [3] A Systematic Review of Features Identification and Extraction for Behavioural Biometric Authentication in Touchscreen Mobile Devices
    Abdulhak, Sami Abduljalil
    Alariki, Ala Abdulhakim
    2018 20TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2018, : 68 - 73
  • [4] Perceptual Features in Speaker Identification
    Segarceanu, Svetlana
    Zaharia, Tiberius
    Radoi, Constantin
    PROCEEDINGS OF THE 2010 8TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2010, : 95 - 98
  • [5] Extraction of Glottal Features for Speaker Recognition
    Ostrogonac, Stevan
    Secujski, Milan
    Knezevic, Dragan
    Suzic, Sinisa
    IEEE 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL CYBERNETICS (ICCC 2013), 2013, : 369 - 373
  • [6] Automatic extraction of geometric lip features with application to multi-modal speaker identification
    Arsic, Ivana
    Vilagut, Roger
    Thiran, Jean-Philippe
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 161 - +
  • [7] Fusion features for robust speaker identification
    Ben Fredj, Ines
    Zouhir, Youssef
    Ouni, Kais
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2018, 11 (02) : 65 - 72
  • [8] Fine structure features for speaker identification
    Jankowski, CR
    Quatieri, TF
    Reynolds, DA
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 689 - 692
  • [9] Robust Q Features for Speaker Identification
    Deshpande, Mangesh S.
    Holambe, Raghunath S.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 209 - 213
  • [10] SELECTION OF ACOUSTIC FEATURES FOR SPEAKER IDENTIFICATION
    SAMBUR, MR
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (02): : 176 - 182