Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

被引:0
|
作者
Truong, Duc-Tuan [1 ]
Tao, Ruijie [2 ]
Nguyen, Tuan [3 ]
Luong, Hieu-Thi [1 ]
Lee, Kong Aik [4 ]
Chng, Eng Siong [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Natl Univ Singapore, Singapore, Singapore
[3] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore
[4] Hong Kong Polytech Univ, Hong Kong, Peoples R China
来源
基金
新加坡国家研究基金会;
关键词
synthetic speech detection; attention learning; ASVspoof challenges;
D O I
10.21437/Interspeech.2024-659
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in specific regions of both frequency channels and temporal segments, while MHSA neglects this temporal-channel dependency of the input sequence. In this work, we proposed a Temporal-Channel Modeling (TCM) module to enhance MHSA's capability for capturing temporal-channel dependencies. Experimental results on the ASVspoof 2021 show that with only 0.03M additional parameters, the TCM module can outperform the state-of-the-art system by 9.25% in EER. Further ablation study reveals that utilizing both temporal and channel information yields the most improvement for detecting synthetic speech.
引用
收藏
页码:537 / 541
页数:5
相关论文
共 50 条
  • [21] Personalized News Recommendation with CNN and Multi-Head Self-Attention
    Li, Aibin
    He, Tingnian
    Guo, Yi
    Li, Zhuoran
    Rong, Yixuan
    Liu, Guoqi
    2022 IEEE 13TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2022, : 102 - 108
  • [22] Personalized multi-head self-attention network for news recommendation
    Zheng, Cong
    Song, Yixuan
    NEURAL NETWORKS, 2025, 181
  • [23] Monaural speech enhancement using U-net fused with multi-head self-attention
    FAN Junyi
    YANG Jibin
    ZHANG Xiongwei
    ZHENG Changyan
    Chinese Journal of Acoustics, 2023, 42 (01) : 98 - 118
  • [24] Monaural speech enhancement using U-net fused with multi-head self-attention
    Fan, Junyi
    Yang, Jibin
    Zhang, Xiongwei
    Zheng, Changyan
    Shengxue Xuebao/Acta Acustica, 2022, 47 (06): : 703 - 716
  • [25] Intelligent Micro-Kick Detection Using a Multi-Head Self-Attention Network
    Zhang, Dezhi
    Sun, Weifeng
    Dai, Yongshou
    Wang, Dongyue
    Guo, Yanliang
    Gong, Chentao
    PROCESSES, 2025, 13 (02)
  • [26] BMNet: Enhancing Deepfake Detection Through BiLSTM and Multi-Head Self-Attention Mechanism
    Xiong, Demao
    Wen, Zhan
    Zhang, Cheng
    Ren, Dehao
    Li, Wenzao
    IEEE ACCESS, 2025, 13 : 21547 - 21556
  • [27] GlobalMind: Global multi-head interactive self-attention network for hyperspectral change detection
    Hu, Meiqi
    Wu, Chen
    Zhang, Liangpei
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2024, 211 : 465 - 483
  • [28] Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets
    Xiao, Xi
    Xiao, Wentao
    Zhang, Dianyan
    Zhang, Bin
    Hu, Guangwu
    Li, Qing
    Xia, Shutao
    COMPUTERS & SECURITY, 2021, 108 (108)
  • [29] Attention as Relation: Learning Supervised Multi-head Self-Attention for Relation Extraction
    Liu, Jie
    Chen, Shaowei
    Wang, Bingquan
    Zhang, Jiaxin
    Li, Na
    Xu, Tong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3787 - 3793
  • [30] A malicious network traffic detection model based on bidirectional temporal convolutional network with multi-head self-attention mechanism
    Cai, Saihua
    Xu, Han
    Liu, Mingjie
    Chen, Zhilin
    Zhang, Guofeng
    COMPUTERS & SECURITY, 2024, 136