Voice gender recognition under unconstrained environments using self-attention

被引:17
|
作者
Nasef, Mohammed M. [1 ]
Sauber, Amr M. [1 ]
Nabil, Mohammed M. [1 ]
机构
[1] Menoufia Univ, Fac Sci, Math & Comp Sci Dept, Menoufia 32511, Egypt
关键词
Voice gender recognition; Self-attention; MFCC; Logistic regression; Inception;
D O I
10.1016/j.apacoust.2020.107823
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice Gender Recognition is a non-trivial task that is extensively studied in the literature, however, when the voice gets surrounded by noises and unconstrained environments, the task becomes more challenging. This paper presents two Self-Attention-based models to deliver an end-to-end voice gender recognition system under unconstrained environments. The first model consists of a stack of six self-attention layers and a dense layer. The second model adds a set of convolution layers and six inception-residual blocks to the first model before the self-attention layers. These models depend on Mel-frequency cepstral coefficients (MFCC) as a representation of the audio data, and Logistic Regression for classification. The experiments were done under unconstrained environments such as background noise and different languages, accents, ages and emotional states of the speakers. The results demonstrate that the proposed models were able to achieve an accuracy of 95.11%, 96.23%, respectively. These models achieved superior performance in all criteria and are believed to be state-of-the-art for Voice Gender Recognition under unconstrained environments. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Multi-Stride Self-Attention for Speech Recognition
    Han, Kyu J.
    Huang, Jing
    Tang, Yun
    He, Xiaodong
    Zhou, Bowen
    INTERSPEECH 2019, 2019, : 2788 - 2792
  • [22] SELF-ATTENTION GUIDED DEEP FEATURES FOR ACTION RECOGNITION
    Xiao, Renyi
    Hou, Yonghong
    Guo, Zihui
    Li, Chuankun
    Wang, Pichao
    Li, Wanqing
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1060 - 1065
  • [23] Context Matters: Self-Attention for Sign Language Recognition
    Slimane, Fares Ben
    Bouguessa, Mohamed
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7884 - 7891
  • [24] ESAformer: Enhanced Self-Attention for Automatic Speech Recognition
    Li, Junhua
    Duan, Zhikui
    Li, Shiren
    Yu, Xinmei
    Yang, Guangguang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 471 - 475
  • [25] A lightweight transformer with linear self-attention for defect recognition
    Zhai, Yuwen
    Li, Xinyu
    Gao, Liang
    Gao, Yiping
    ELECTRONICS LETTERS, 2024, 60 (17)
  • [26] Finger Vein Recognition Based on ResNet With Self-Attention
    Zhang, Zhibo
    Chen, Guanghua
    Zhang, Weifeng
    Wang, Huiyang
    IEEE ACCESS, 2024, 12 : 1943 - 1951
  • [27] Multimodal cooperative self-attention network for action recognition
    Zhong, Zhuokun
    Hou, Zhenjie
    Liang, Jiuzhen
    Lin, En
    Shi, Haiyong
    IET IMAGE PROCESSING, 2023, 17 (06) : 1775 - 1783
  • [28] UniFormer: Unifying Convolution and Self-Attention for Visual Recognition
    Li, Kunchang
    Wang, Yali
    Zhang, Junhao
    Gao, Peng
    Song, Guanglu
    Liu, Yu
    Li, Hongsheng
    Qiao, Yu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12581 - 12600
  • [29] Split-Attention CNN and Self-Attention With RoPE and GCN for Voice Activity Detection
    Tan, Yingwei
    Ding, Xuefeng
    IEEE ACCESS, 2024, 12 : 156673 - 156682
  • [30] Decision Robustness of Voice Activity Segmentation in unconstrained mobile Speaker Recognition Environments
    Nautsch, Andreas
    Bamberger, Reiner
    Busch, Christoph
    PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE OF THE BIOMETRICS SPECIAL INTEREST GROUP (BIOSIG 2016), 2016, P-260