SSLCT: A Convolutional Transformer for Synthetic Speech Localization

被引:0
|
作者
Bhagtani, Kratika [1 ]
Yadav, Amit Kumar Singh [1 ]
Bestagini, Paolo [2 ]
Delp, Edward J. [1 ]
机构
[1] Purdue Univ, Video & Image Proc Lab VIPER, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
Synthetic speech localization; speech forensics; deepfake speech; PartialSpoof; transformer;
D O I
10.1109/MIPR62202.2024.00028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect fully synthetic speech have been developed. Speech editing tools can also create partially synthetic speech in which only a part of the speech signal is synthetic. Detecting these short synthetic segments within a speech signal requires specialized methods to determine the temporal location of the synthetic speech. In this paper, we propose the Synthetic Speech Localization Convolutional Transformer (SSLCT), a neural network and transformer method for synthetic speech localization. SSLCT can temporally localize synthetic speech segments as small as 20 milliseconds. We demonstrate that SSLCT achieves less than 10% Equal Error Rate (EER), which is an improvement over several existing methods.
引用
收藏
页码:134 / 140
页数:7
相关论文
共 50 条
  • [21] Audio Transformer for Synthetic Speech Detection via Benford's Law Distribution Analysis
    Ashoka, Anitha Bhat Talagini
    Cuccovillo, Luca
    Aichroth, Patrick
    PROCEEDINGS OF THE 3RD ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2024, 2024, : 23 - 29
  • [22] A Visible and Synthetic Aperture Radar Image Fusion Algorithm Based on a Transformer and a Convolutional Neural Network
    Hu, Liushun
    Su, Shaojing
    Zuo, Zhen
    Wei, Junyu
    Huang, Siyang
    Zhao, Zongqing
    Tong, Xiaozhong
    Yuan, Shudong
    ELECTRONICS, 2024, 13 (12)
  • [23] Intelligent Localization of Transformer Internal Degradations Combining Deep Convolutional Neural Networks and Image Segmentation
    Duan, Jiajun
    He, Yigang
    Du, Bolun
    Ghandour, Ruaa M. Rashad
    Wu, Wenjie
    Zhang, Hui
    IEEE ACCESS, 2019, 7 : 62705 - 62720
  • [24] An Adaptive Method Based on Multiscale Dilated Convolutional Network for Binaural Speech Source Localization
    Wu, Lulu
    Liu, Hong
    Yang, Bing
    Ding, Runwei
    COMPLEXITY, 2020, 2020
  • [25] Universal Speech Transformer
    Zhao, Yingzhu
    Ni, Chongjia
    Leung, Cheung-Chi
    Joty, Shafiq
    Chng, Eng Siong
    Ma, Bin
    INTERSPEECH 2020, 2020, : 5021 - 5025
  • [26] Automatic Text-Independent Artifact Detection, Localization, and Classification in Synthetic Speech
    Pribil, Jiri
    Pribilova, Anna
    Matousek, Jindrich
    RADIOENGINEERING, 2017, 26 (04) : 1151 - 1160
  • [27] A Lightweight Transformer with Convolutional Attention
    Zeng, Kungan
    Paik, Incheon
    2020 11TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2020,
  • [28] Spiking Convolutional Vision Transformer
    Talafha, Sameerah
    Rekabdar, Banafsheh
    Mousas, Christos
    Ekenna, Chinwe
    2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC, 2023, : 225 - 226
  • [29] CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment
    Liu, Yuchen
    Yang, Li-Chia
    Pawlicki, Alex
    Stamenovic, Marko
    INTERSPEECH 2022, 2022, : 3318 - 3322
  • [30] Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features
    Wang, Yonggu
    Pan, Kailin
    Shao, Yifan
    Ma, Jiarong
    Li, Xiaojuan
    APPLIED SCIENCES-BASEL, 2025, 15 (06):