Voice gender recognition under unconstrained environments using self-attention

被引:17
|
作者
Nasef, Mohammed M. [1 ]
Sauber, Amr M. [1 ]
Nabil, Mohammed M. [1 ]
机构
[1] Menoufia Univ, Fac Sci, Math & Comp Sci Dept, Menoufia 32511, Egypt
关键词
Voice gender recognition; Self-attention; MFCC; Logistic regression; Inception;
D O I
10.1016/j.apacoust.2020.107823
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice Gender Recognition is a non-trivial task that is extensively studied in the literature, however, when the voice gets surrounded by noises and unconstrained environments, the task becomes more challenging. This paper presents two Self-Attention-based models to deliver an end-to-end voice gender recognition system under unconstrained environments. The first model consists of a stack of six self-attention layers and a dense layer. The second model adds a set of convolution layers and six inception-residual blocks to the first model before the self-attention layers. These models depend on Mel-frequency cepstral coefficients (MFCC) as a representation of the audio data, and Logistic Regression for classification. The experiments were done under unconstrained environments such as background noise and different languages, accents, ages and emotional states of the speakers. The results demonstrate that the proposed models were able to achieve an accuracy of 95.11%, 96.23%, respectively. These models achieved superior performance in all criteria and are believed to be state-of-the-art for Voice Gender Recognition under unconstrained environments. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] NEPALI SPEECH RECOGNITION USING SELF-ATTENTION NETWORKS
    Joshi, Basanta
    Shrestha, Rupesh
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2023, 19 (06): : 1769 - 1784
  • [2] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [3] Neural Named Entity Recognition Using a Self-Attention Mechanism
    Zukov-Gregoric, Andrej
    Bachrach, Yoram
    Minkovsky, Pasha
    Coope, Sam
    Maksak, Bogdan
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 652 - 656
  • [4] Using Self-Attention LSTMs to Enhance Observations in Goal Recognition
    Amado, Leonardo
    Licks, Gabriel Paludo
    Marcon, Matheus
    Pereira, Ramon Fraga
    Meneguzzi, Felipe
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [5] Robust Gait Recognition under Unconstrained Environments using Hybrid Descriptions
    Yao, Lingxiang
    Kusakunniran, Worapan
    Wu, Qiang
    Zhang, Jian
    Tang, Zhenmin
    2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 441 - 447
  • [6] A framework for facial expression recognition using deep self-attention network
    Indolia S.
    Nigam S.
    Singh R.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (07) : 9543 - 9562
  • [7] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    INTERSPEECH 2020, 2020, : 941 - 945
  • [8] Cyclic Self-attention for Point Cloud Recognition
    Zhu, Guanyu
    Zhou, Yong
    Yao, Rui
    Zhu, Hancheng
    Zhao, Jiaqi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [9] Self-Attention Networks for Human Activity Recognition Using Wearable Devices
    Betancourt, Carlos
    Chen, Wen-Hui
    Kuan, Chi-Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1194 - 1199
  • [10] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
    Paiva, Pedro V. V.
    Ramos, Josue J. G.
    Gavrilova, Marina
    Carvalho, Marco A. G.
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228