SSLCT: A Convolutional Transformer for Synthetic Speech Localization

被引：0

作者：

Bhagtani, Kratika ^{[1
]}

Yadav, Amit Kumar Singh ^{[1
]}

Bestagini, Paolo ^{[2
]}

Delp, Edward J. ^{[1
]}

机构：

[1] Purdue Univ, Video & Image Proc Lab VIPER, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy

来源：

2024 IEEE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 2024 | 2024年

关键词：

Synthetic speech localization; speech forensics; deepfake speech; PartialSpoof; transformer;

D O I：

10.1109/MIPR62202.2024.00028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect fully synthetic speech have been developed. Speech editing tools can also create partially synthetic speech in which only a part of the speech signal is synthetic. Detecting these short synthetic segments within a speech signal requires specialized methods to determine the temporal location of the synthetic speech. In this paper, we propose the Synthetic Speech Localization Convolutional Transformer (SSLCT), a neural network and transformer method for synthetic speech localization. SSLCT can temporally localize synthetic speech segments as small as 20 milliseconds. We demonstrate that SSLCT achieves less than 10% Equal Error Rate (EER), which is an improvement over several existing methods.

引用

页码：134 / 140

页数：7

共 50 条

[31] Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone
Kucuk, Abdullah
Ganguly, Anshuman
Hao, Yiya
Panahi, Issa M. S.
IEEE ACCESS, 2019, 7 : 169969 - 169978
[32] CUR Transformer: A Convolutional Unbiased Regional Transformer for Image Denoising
Xu, Kang
Li, Weixin
Wang, Xia
Hu, Xiaoyan
Yan, Ke
Wang, Xiaojie
Dong, Xuan
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
[33] Convolutional Transformer-in-Transformer for Automatic Sleep Stage Classification
Kim, Moogyeong
Chung, Wonzoo
2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024, 2024,
[34] SETransformer: Speech Enhancement Transformer
Yu, Weiwei
Zhou, Jian
Wang, HuaBin
Tao, Liang
COGNITIVE COMPUTATION, 2022, 14 (03) : 1152 - 1158
[35] Lattice Transformer for Speech Translation
Zhang, Pei
Chen, Boxing
Ge, Niyu
Fan, Kai
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6475 - 6484
[36] SETransformer: Speech Enhancement Transformer
Weiwei Yu
Jian Zhou
HuaBin Wang
Liang Tao
Cognitive Computation, 2022, 14 : 1152 - 1158
[37] ROLE OF SYNTHETIC SPEECH IN SPEECH RESEARCH
LAWRENCE, W
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1964, 36 (05): : 1022 - &
[38] SYNTHESIZING SPEECH + SYNTHETIC SPEECH MUSIC
DODGE, C
MUSIC JOURNAL, 1976, 34 (02): : 14 - &
[39] Combining Transformer Generators with Convolutional Discriminators
Durall, Ricard
Frolov, Stanislav
Hees, Jorn
Raue, Federico
Pfreundt, Franz-Josef
Dengel, Andreas
Keuper, Janis
ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2021, 2021, 12873 : 67 - 79
[40] Progressive convolutional transformer for image restoration
Wan, Yecong
Shao, Mingwen
Cheng, Yuanshuo
Meng, Deyu
Zuo, Wangmeng
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 125

← 1 2 3 4 5 →