Audio Multi-View Spoofing Detection Framework Based on Audio-Text-Emotion Correlations

被引:0
|
作者
Wu, Junyan [1 ]
Yin, Qilin [1 ]
Sheng, Ziqi [1 ]
Lu, Wei [1 ]
Huang, Jiwu [2 ]
Li, Bin [3 ,4 ]
机构
[1] Sun Yat sen Univ, Sch Comp Sci & Engn, Guangdong Prov Key Lab Informat Secur Technol, Minist Educ,Key Lab Informat Technol, Guangzhou 510006, Peoples R China
[2] Shenzhen MSU BIT Univ, Fac Engn, Guangdong Lab Machine Percept & Intelligent Comp, Shenzhen 518116, Peoples R China
[3] Shenzhen Univ, Guangdong Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China
[4] Shenzhen Univ, Shenzhen Key Lab Media Secur, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimedia forensics; audio spoofing detection; multi-view learning; graph attention mechanism; AUTOMATIC SPEAKER VERIFICATION; SPEECH;
D O I
10.1109/TIFS.2024.3431888
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In recent years, audio spoofing detection has received widespread attention for protecting personal privacy and social security. Despite the significant progress achieved in audio single-view spoofing detection, challenges remain with regard to addressing unknown spoofing attacks in realistic scenarios. To solve these challenging problems, in this paper, we introduce a novel audio multi-view spoofing detection framework (AMSDF), whose goal is to capture both intra-view and inter-view cues by measuring correlations within audio multi-view features (i.e., audio-emotion-text) for audio spoofing detection. In general, different view features are inherently interconnected in the real patterns, while they may present unnatural correlations in the spoofing patterns. Therefore, more discriminative cues can be mined by utilizing their complex interactions, which is beneficial to the audio spoofing detection task. To this end, an intra-view graph attention mechanism (IGAM) is first utilized to aggregate each intra-view node within the same view. Subsequently, a heterogeneous graph fusion module (HGFM) is applied to measure correlations within inter-view nodes, which are enhanced with a master node for comprehensive analysis purposes. Finally, a group-based readout scheme (GRS) is designed to capture and preserve the most distinctive cues by leveraging the strengths of different feature sets, thereby effectively distinguishing subtle differences between real and spoofing audio. The experimental results show that our proposed framework can achieve better performance than that of the state-of-the-art methods, especially in realistic scenarios. The code and pre-trained models are available at https://github.com/ItzJuny/AMSDF.
引用
收藏
页码:7133 / 7146
页数:14
相关论文
共 50 条
  • [41] QoE Assessment of Multi-View Video and Audio Simultaneous IP Transmission: The Effect of User Interfaces
    Francis, Francis Jeganatan Wilson
    Nunome, Toshiro
    2014 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2014, : 466 - 471
  • [42] The Effect of Audiovisual Cross-Modality on QoE of Multi-View Video and Audio IP Transmission
    Nunome, Toshiro
    Sako, Kazunori
    2016 18TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2016,
  • [43] The Effect of Spatiotemporal Tradeoff of Picture Patterns on QoE in Multi-View Video and Audio IP Transmission
    Nunome, Toshiro
    Tsuya, Yusuke
    GENETIC AND EVOLUTIONARY COMPUTING, VOL II, 2016, 388 : 139 - 146
  • [44] Multi-view Learning for Emotion Detection in Code-switching Texts
    Lee, Sophia Yat Mei
    Wang, Zhongqing
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 90 - 93
  • [45] MULTI-VIEW AUDIO-ARTICULATORY FEATURES FOR PHONETIC RECOGNITION ON RTMRI-TIMIT DATABASE
    Douros, Ioannis K.
    Katsamanis, Athanasios
    Maragos, Petros
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5514 - 5518
  • [46] An audio-based risky flight detection framework for quadrotors
    Liu, Wansong
    Liu, Chang
    Sajedi, Seyedomid
    Su, Hao
    Liang, Xiao
    Zheng, Minghui
    IET CYBER-SYSTEMS AND ROBOTICS, 2024, 6 (01)
  • [47] Image-Text Multimodal Emotion Classification via Multi-View Attentional Network
    Yang, Xiaocui
    Feng, Shi
    Wang, Daling
    Zhang, Yifei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4014 - 4026
  • [48] Multi-View Video Based Tracking and Audio-Visual Identification of Persons in a Human-Computer-Interaction Scenario
    Meudt, Sascha
    Glodek, Michael
    Schels, Martin
    Schwenker, Friedhelm
    2013 IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), 2013,
  • [49] A Deep Multi-View Framework for Anomaly Detection on Attributed Networks
    Peng, Zhen
    Luo, Minnan
    Li, Jundong
    Xue, Luguo
    Zheng, Qinghua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2539 - 2552
  • [50] A Multi-View Deep Learning Framework for EEG Seizure Detection
    Yuan, Ye
    Xun, Guangxu
    Jia, Kebin
    Zhang, Aidong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (01) : 83 - 94