SSLCT: A Convolutional Transformer for Synthetic Speech Localization

被引:0
|
作者
Bhagtani, Kratika [1 ]
Yadav, Amit Kumar Singh [1 ]
Bestagini, Paolo [2 ]
Delp, Edward J. [1 ]
机构
[1] Purdue Univ, Video & Image Proc Lab VIPER, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
Synthetic speech localization; speech forensics; deepfake speech; PartialSpoof; transformer;
D O I
10.1109/MIPR62202.2024.00028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect fully synthetic speech have been developed. Speech editing tools can also create partially synthetic speech in which only a part of the speech signal is synthetic. Detecting these short synthetic segments within a speech signal requires specialized methods to determine the temporal location of the synthetic speech. In this paper, we propose the Synthetic Speech Localization Convolutional Transformer (SSLCT), a neural network and transformer method for synthetic speech localization. SSLCT can temporally localize synthetic speech segments as small as 20 milliseconds. We demonstrate that SSLCT achieves less than 10% Equal Error Rate (EER), which is an improvement over several existing methods.
引用
收藏
页码:134 / 140
页数:7
相关论文
共 50 条
  • [31] Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone
    Kucuk, Abdullah
    Ganguly, Anshuman
    Hao, Yiya
    Panahi, Issa M. S.
    IEEE ACCESS, 2019, 7 : 169969 - 169978
  • [32] CUR Transformer: A Convolutional Unbiased Regional Transformer for Image Denoising
    Xu, Kang
    Li, Weixin
    Wang, Xia
    Hu, Xiaoyan
    Yan, Ke
    Wang, Xiaojie
    Dong, Xuan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [33] Convolutional Transformer-in-Transformer for Automatic Sleep Stage Classification
    Kim, Moogyeong
    Chung, Wonzoo
    2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024, 2024,
  • [34] SETransformer: Speech Enhancement Transformer
    Yu, Weiwei
    Zhou, Jian
    Wang, HuaBin
    Tao, Liang
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1152 - 1158
  • [35] Lattice Transformer for Speech Translation
    Zhang, Pei
    Chen, Boxing
    Ge, Niyu
    Fan, Kai
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6475 - 6484
  • [36] SETransformer: Speech Enhancement Transformer
    Weiwei Yu
    Jian Zhou
    HuaBin Wang
    Liang Tao
    Cognitive Computation, 2022, 14 : 1152 - 1158
  • [37] ROLE OF SYNTHETIC SPEECH IN SPEECH RESEARCH
    LAWRENCE, W
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1964, 36 (05): : 1022 - &
  • [38] SYNTHESIZING SPEECH + SYNTHETIC SPEECH MUSIC
    DODGE, C
    MUSIC JOURNAL, 1976, 34 (02): : 14 - &
  • [39] Combining Transformer Generators with Convolutional Discriminators
    Durall, Ricard
    Frolov, Stanislav
    Hees, Jorn
    Raue, Federico
    Pfreundt, Franz-Josef
    Dengel, Andreas
    Keuper, Janis
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2021, 2021, 12873 : 67 - 79
  • [40] Progressive convolutional transformer for image restoration
    Wan, Yecong
    Shao, Mingwen
    Cheng, Yuanshuo
    Meng, Deyu
    Zuo, Wangmeng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 125