SSLCT: A Convolutional Transformer for Synthetic Speech Localization

被引：0

作者：

Bhagtani, Kratika ^{[1
]}

Yadav, Amit Kumar Singh ^{[1
]}

Bestagini, Paolo ^{[2
]}

Delp, Edward J. ^{[1
]}

机构：

[1] Purdue Univ, Video & Image Proc Lab VIPER, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy

来源：

2024 IEEE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 2024 | 2024年

关键词：

Synthetic speech localization; speech forensics; deepfake speech; PartialSpoof; transformer;

D O I：

10.1109/MIPR62202.2024.00028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect fully synthetic speech have been developed. Speech editing tools can also create partially synthetic speech in which only a part of the speech signal is synthetic. Detecting these short synthetic segments within a speech signal requires specialized methods to determine the temporal location of the synthetic speech. In this paper, we propose the Synthetic Speech Localization Convolutional Transformer (SSLCT), a neural network and transformer method for synthetic speech localization. SSLCT can temporally localize synthetic speech segments as small as 20 milliseconds. We demonstrate that SSLCT achieves less than 10% Equal Error Rate (EER), which is an improvement over several existing methods.

引用

页码：134 / 140

页数：7

共 50 条

[1] GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition
Gao, Yingxue
Zhao, Huan
Xiao, Yufeng
Zhang, Zixing
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 307 - 313
[2] A Comparison Between Convolutional and Transformer Architectures for Speech Emotion Recognition
Iyer, Shreyah
Glackin, Cornelius
Cannings, Nigel
Veneziano, Vito
Sun, Yi
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[3] Audio Spectrogram Transformer for Synthetic Speech Detection via Speech Formant Analysis
Cuccovillo, Luca
Gerhardt, Milica
Aichroth, Patrick
2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS, 2023,
[4] Congested crowd instance localization with dilated convolutional swin transformer
Gao, Junyu
Gong, Maoguo
Li, Xuelong
NEUROCOMPUTING, 2022, 513 : 94 - 103
[5] BINAURAL SPEECH ENHANCEMENT USING DEEP COMPLEX CONVOLUTIONAL TRANSFORMER NETWORKS
Tokala, Vikas
Grinstein, Eric
Brookes, Mike
Doclo, Simon
Jensen, Jesper
Naylor, Patrick A.
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 681 - 685
[6] Depression Detection in Speech Using Transformer and Parallel Convolutional Neural Networks
Yin, Faming
Du, Jing
Xu, Xinzhou
Zhao, Li
ELECTRONICS, 2023, 12 (02)
[7] Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement
Jannu, Chaitanya
Vanambathina, Sunny Dayal
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 731 - 743
[8] Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Agarwal, Shrutina
Ganapathy, Sriram
Takahashi, Naoya
INTERSPEECH 2022, 2022, : 3013 - 3017
[9] AUDIO TRANSFORMER FOR SYNTHETIC SPEECH DETECTION VIA FORMANT MAGNITUDE AND PHASE ANALYSIS
Cuccovillo, Luca
Gerhardt, Milica
Aichroth, Patrick
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4805 - 4809
[10] SpoTNet: A spoofing-aware Transformer Network for Effective Synthetic Speech Detection
Khan, Awais
Malik, Khalid Mahmood
PROCEEDINGS OF THE 2ND ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISCRIMINATION, MAD 2023, 2023, : 10 - 18

← 1 2 3 4 5 →