SSLCT: A Convolutional Transformer for Synthetic Speech Localization

被引：0

作者：

Bhagtani, Kratika ^{[1
]}

Yadav, Amit Kumar Singh ^{[1
]}

Bestagini, Paolo ^{[2
]}

Delp, Edward J. ^{[1
]}

机构：

[1] Purdue Univ, Video & Image Proc Lab VIPER, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy

来源：

2024 IEEE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 2024 | 2024年

关键词：

Synthetic speech localization; speech forensics; deepfake speech; PartialSpoof; transformer;

D O I：

10.1109/MIPR62202.2024.00028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect fully synthetic speech have been developed. Speech editing tools can also create partially synthetic speech in which only a part of the speech signal is synthetic. Detecting these short synthetic segments within a speech signal requires specialized methods to determine the temporal location of the synthetic speech. In this paper, we propose the Synthetic Speech Localization Convolutional Transformer (SSLCT), a neural network and transformer method for synthetic speech localization. SSLCT can temporally localize synthetic speech segments as small as 20 milliseconds. We demonstrate that SSLCT achieves less than 10% Equal Error Rate (EER), which is an improvement over several existing methods.

引用

页码：134 / 140

页数：7

共 50 条

[21] Audio Transformer for Synthetic Speech Detection via Benford's Law Distribution Analysis
Ashoka, Anitha Bhat Talagini
Cuccovillo, Luca
Aichroth, Patrick
PROCEEDINGS OF THE 3RD ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2024, 2024, : 23 - 29
[22] A Visible and Synthetic Aperture Radar Image Fusion Algorithm Based on a Transformer and a Convolutional Neural Network
Hu, Liushun
Su, Shaojing
Zuo, Zhen
Wei, Junyu
Huang, Siyang
Zhao, Zongqing
Tong, Xiaozhong
Yuan, Shudong
ELECTRONICS, 2024, 13 (12)
[23] Intelligent Localization of Transformer Internal Degradations Combining Deep Convolutional Neural Networks and Image Segmentation
Duan, Jiajun
He, Yigang
Du, Bolun
Ghandour, Ruaa M. Rashad
Wu, Wenjie
Zhang, Hui
IEEE ACCESS, 2019, 7 : 62705 - 62720
[24] An Adaptive Method Based on Multiscale Dilated Convolutional Network for Binaural Speech Source Localization
Wu, Lulu
Liu, Hong
Yang, Bing
Ding, Runwei
COMPLEXITY, 2020, 2020
[25] Universal Speech Transformer
Zhao, Yingzhu
Ni, Chongjia
Leung, Cheung-Chi
Joty, Shafiq
Chng, Eng Siong
Ma, Bin
INTERSPEECH 2020, 2020, : 5021 - 5025
[26] Automatic Text-Independent Artifact Detection, Localization, and Classification in Synthetic Speech
Pribil, Jiri
Pribilova, Anna
Matousek, Jindrich
RADIOENGINEERING, 2017, 26 (04) : 1151 - 1160
[27] A Lightweight Transformer with Convolutional Attention
Zeng, Kungan
Paik, Incheon
2020 11TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2020,
[28] Spiking Convolutional Vision Transformer
Talafha, Sameerah
Rekabdar, Banafsheh
Mousas, Christos
Ekenna, Chinwe
2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC, 2023, : 225 - 226
[29] CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment
Liu, Yuchen
Yang, Li-Chia
Pawlicki, Alex
Stamenovic, Marko
INTERSPEECH 2022, 2022, : 3318 - 3322
[30] Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features
Wang, Yonggu
Pan, Kailin
Shao, Yifan
Ma, Jiarong
Li, Xiaojuan
APPLIED SCIENCES-BASEL, 2025, 15 (06):

← 1 2 3 4 5 →