On the Use of Cross-module Attention Statistics Pooling for Speaker Verification

被引:0
|
作者
Alam, Jahangir [1 ]
Fathan, Abderrahim [1 ]
机构
[1] CRIM, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Speaker verification; neural speaker embeddings; hybrid neural network; cross-module attention;
D O I
10.1109/IWBF57495.2023.10157564
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In deep learning-based speaker verification frameworks, extraction of a speaker embedding vector plays a key role. In this contribution, we propose a hybrid neural network that employs a cross-module attention pooling mechanism for the extraction of speaker discriminant utterance-level embeddings. In particular, the proposed system incorporates a 2D-Convolution Neural Network (CNN)-based feature extraction module in cascade with a frame-level network, which is composed of a fully Time Delay Neural Network (TDNN) network and a TDNN-Long Short Term Memory (TDNN-LSTM) hybrid network in a parallel manner. The proposed system also employs cross-module attention statistics pooling for aggregating the speaker information within an utterance-level context by capturing the complementarity between two parallelly connected modules. We conduct a set of experiments on the Voxceleb corpus for evaluating the performance of the proposed system and the proposed hybrid network is able to provide better results than the conventional approaches trained on the same dataset.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] On the Use of Cross- and Self-Module Attentive Statistics Pooling Techniques for Text-Independent Speaker Verification
    Alam, Jahangir
    2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
  • [2] CROSS ATTENTIVE POOLING FOR SPEAKER VERIFICATION
    Kye, Seong Min
    Kwon, Yoohwan
    Chung, Joon Son
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 294 - 300
  • [3] Typed cross-module compilation
    Zhong, S
    ACM SIGPLAN NOTICES, 1999, 34 (01) : 141 - 152
  • [4] Typed cross-module compilation
    Yale Univ, New Haven, CT, United States
    Proc ACM SIGPLAN Int Conf Funct Program ICFP, (141-152):
  • [5] Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification
    Zhang, Leying
    Chen, Zhengyang
    Qian, Yanmin
    INTERSPEECH 2022, 2022, : 311 - 315
  • [6] Scalable cross-module optimization
    Ayers, A
    de Jong, S
    Peyton, J
    Schooler, R
    ACM SIGPLAN NOTICES, 1998, 33 (05) : 301 - 312
  • [7] Exploring a Unified Attention-Based Pooling Framework for Speaker Verification
    Liu, Yi
    He, Liang
    Liu, Weiwei
    Liu, Jia
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 200 - 204
  • [8] Scalable high performance cross-module inlining
    Chakrabarti, DR
    Lozano, LA
    Li, XLD
    Hundt, R
    Liu, SM
    13TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES, PROCEEDINGS, 2004, : 165 - 176
  • [9] SYZYGY - A framework for scalable cross-module IPO
    Moon, SD
    Li, XD
    Hundt, R
    Chakrabarti, DR
    Lozano, LA
    Srinivasan, U
    Liu, SM
    CGO 2004: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2004, : 65 - 74
  • [10] Optimized cross-module attention network and medium-scale dataset for effective fire detection
    Khan, Zulfiqar Ahmad
    Ullah, Fath U. Min
    Yar, Hikmat
    Ullah, Waseem
    Khan, Noman
    Kim, Min Je
    Baik, Sung Wook
    PATTERN RECOGNITION, 2025, 161