Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

被引:0
|
作者
Truong, Duc-Tuan [1 ]
Tao, Ruijie [2 ]
Nguyen, Tuan [3 ]
Luong, Hieu-Thi [1 ]
Lee, Kong Aik [4 ]
Chng, Eng Siong [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Natl Univ Singapore, Singapore, Singapore
[3] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore
[4] Hong Kong Polytech Univ, Hong Kong, Peoples R China
来源
基金
新加坡国家研究基金会;
关键词
synthetic speech detection; attention learning; ASVspoof challenges;
D O I
10.21437/Interspeech.2024-659
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in specific regions of both frequency channels and temporal segments, while MHSA neglects this temporal-channel dependency of the input sequence. In this work, we proposed a Temporal-Channel Modeling (TCM) module to enhance MHSA's capability for capturing temporal-channel dependencies. Experimental results on the ASVspoof 2021 show that with only 0.03M additional parameters, the TCM module can outperform the state-of-the-art system by 9.25% in EER. Further ablation study reveals that utilizing both temporal and channel information yields the most improvement for detecting synthetic speech.
引用
收藏
页码:537 / 541
页数:5
相关论文
共 50 条
  • [1] Masked multi-head self-attention for causal speech enhancement
    Nicolson, Aaron
    Paliwal, Kuldip K.
    SPEECH COMMUNICATION, 2020, 125 : 80 - 96
  • [2] SPEECH ENHANCEMENT USING SELF-ADAPTATION AND MULTI-HEAD SELF-ATTENTION
    Koizumi, Yuma
    Yatabe, Kohei
    Delcroix, Marc
    Masuyama, Yoshiki
    Takeuchi, Daiki
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 181 - 185
  • [3] Multi-head enhanced self-attention network for novelty detection
    Zhang, Yingying
    Gong, Yuxin
    Zhu, Haogang
    Bai, Xiao
    Tang, Wenzhong
    PATTERN RECOGNITION, 2020, 107
  • [4] Epilepsy detection based on multi-head self-attention mechanism
    Ru, Yandong
    An, Gaoyang
    Wei, Zheng
    Chen, Hongming
    PLOS ONE, 2024, 19 (06):
  • [5] Speech enhancement method based on the multi-head self-attention mechanism
    Chang X.
    Zhang Y.
    Yang L.
    Kou J.
    Wang X.
    Xu D.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2020, 47 (01): : 104 - 110
  • [6] Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection
    Liu, Liangqi
    Wu, Zhiyong
    Li, Runnan
    Jia, Jia
    Meng, Helen
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 922 - 926
  • [7] Local Multi-Head Channel Self-Attention for Facial Expression Recognition
    Pecoraro, Roberto
    Basile, Valerio
    Bono, Viviana
    INFORMATION, 2022, 13 (09)
  • [8] Adaptive Pruning for Multi-Head Self-Attention
    Messaoud, Walid
    Trabelsi, Rim
    Cabani, Adnane
    Abdelkefi, Fatma
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II, 2023, 14126 : 48 - 57
  • [9] Detection of malicious URLs using Temporal Convolutional Network and Multi-Head Self-Attention mechanism
    Nguyet Quang Do
    Selamat, Ali
    Krejcar, Ondrej
    Fujita, Hamido
    APPLIED SOFT COMPUTING, 2025, 169
  • [10] Lane Detection Method Based on Improved Multi-Head Self-Attention
    Ge, Zekun
    Tao, Fazhan
    Fu, Zhumu
    Song, Shuzhong
    Computer Engineering and Applications, 60 (02): : 264 - 271