FTDKD: Frequency-Time Domain Knowledge Distillation for Low-Quality Compressed Audio Deepfake Detection

被引:0
|
作者
Wang, Bo [1 ]
Tang, Yeling [1 ]
Wei, Fei [2 ]
Ba, Zhongjie [3 ]
Ren, Kui [3 ]
机构
[1] Dalian Univ Technol, Sch Informat & Commun Engn, Dalian 116081, Peoples R China
[2] Alibaba Grp, Hangzhou 311121, Zhejiang, Peoples R China
[3] Zhejiang Univ, Sch Cyber Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Audio deepfake detection; low-quality compressed audio; knowledge distillation;
D O I
10.1109/TASLP.2024.3492796
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In recent years, the field of audio deepfake detection has witnessed significant advancements. Nonetheless, the majority of solutions have concentrated on high-quality audio, largely overlooking the challenge of low-quality compressed audio in real-world scenarios. Low-quality compressed audio typically suffers from a loss of high-frequency details and time-domain information, which significantly undermines the performance of advanced deepfake detection systems when confronted with such data. In this paper, we introduce a deepfake detection model that employs knowledge distillation across the frequency and time domains. Our approach aims to train a teacher model with high-quality data and a student model with low-quality compressed data. Subsequently, we implement frequency-domain and time-domain distillation to facilitate the student model's learning of high-frequency information and time-domain details from the teacher model. Experimental evaluations on the ASVspoof 2019 LA and ASVspoof 2021 DF datasets illustrate the effectiveness of our methodology. On the ASVspoof 2021 DF dataset, which consists of low-quality compressed audio, we achieved an Equal Error Rate (EER) of 2.82%. To our knowledge, this performance is the best among all deepfake voice detection systems tested on the ASVspoof 2021 DF dataset. Additionally, our method proves to be versatile, showing notable performance on high-quality data with an EER of 0.30% on the ASVspoof 2019 LA dataset, closely approaching state-of-the-art results.
引用
收藏
页码:4905 / 4918
页数:14
相关论文
共 37 条
  • [21] Development of quality bounds for time and frequency domain models: application to the shell distillation column
    Univ of Toronto, Toronto, Canada
    J Process Control, 1 (75-80):
  • [22] Rule-enhanced Noisy Knowledge Graph Embedding via Low-quality Error Detection
    Hong, Yan
    Bu, Chenyang
    Jiang, Tingting
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 544 - 551
  • [23] PSSM-Distil: Protein Secondary Structure Prediction (PSSP) on Low-Quality PSSM by Knowledge Distillation with Contrastive Learning
    Wang, Qin
    Wang, Boyuan
    Xu, Zhenlei
    Wu, Jiaxiang
    Zhao, Peilin
    Li, Zhen
    Wang, Sheng
    Huang, Junzhou
    Cui, Shuguang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 617 - 625
  • [24] Low frequency shadow detection of gas reservoirs in time-frequency domain
    Chen Xue-Hua
    He Zhen-Hua
    Huang De-Ji
    Wen Xiao-Tao
    CHINESE JOURNAL OF GEOPHYSICS-CHINESE EDITION, 2009, 52 (01): : 215 - 221
  • [25] TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS
    Wu, Yifei
    Li, Chenda
    Bai, Jinfeng
    Wu, Zhongqin
    Qian, Yanmin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 256 - 260
  • [26] A study on real-time low-quality content detection on Twitter from the users' perspective
    Chen, Weiling
    Yeo, Chai Kiat
    Lau, Chiew Tong
    Lee, Bu Sung
    PLOS ONE, 2017, 12 (08):
  • [27] FST-Net: Exploiting Frequency Spatial Temporal Information for Low-Quality Fake Video Detection
    Zhang, Min
    Liu, Xiaohan
    Liu, Chenyu
    Zhang, Xueqi
    Xie, Haiyong
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 536 - 543
  • [28] Low-frequency shadow detection of coalbed methane in time-frequency domain
    Zhao Q.
    Yun M.
    Wang E.
    Yang S.
    Tian Y.
    Meitan Xuebao/Journal of the China Coal Society, 2019, 44 (05): : 1552 - 1561
  • [29] Low-Complexity Hybrid Time-Frequency Audio Signal Pattern Detection
    Martalo, Marco
    Ferrari, Gianluigi
    Malavenda, Claudio Santo
    IEEE SENSORS JOURNAL, 2013, 13 (02) : 501 - 509
  • [30] Super-resolution of low-quality spectral domain optical coherence tomography images to enable choroidal biomarker detection
    Chhablani, Jay
    Sharanya, B.
    Jana, Soumya
    Sahel, Jose
    Vupparaboina, Kiran Kumar
    Bollepalli, Sandeep Chandra
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)