Discriminative features based on modified log magnitude spectrum for playback speech detection

被引:0
|
作者
Jichen Yang
Longting Xu
Bo Ren
Yunyun Ji
机构
[1] Department of Electrical and Computer Engineering,
[2] National University of Singapore,undefined
[3] College of Information Science and Technology,undefined
[4] Donghua University,undefined
[5] Microsoft Search Technology Center Asia,undefined
[6] Electronics and Information School,undefined
[7] Nantong University,undefined
关键词
Discriminative feature; Playback attack detection; Modified log magnitude spectrum; Constant-Q variance-based octave coefficients; Constant-Q mean-based octave coefficients;
D O I
暂无
中图分类号
学科分类号
摘要
In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients, are proposed for playback speech detection in this work. They rely on our findings that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can enhance the discriminative power between genuine speech and playback speech. Then constant-Q variance-based octave coefficients (constant-Q mean-based octave coefficients) can be obtained by combining variance-based modified log magnitude spectrum (mean-based modified log magnitude spectrum), octave segmentation, and discrete cosine transform. Finally, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients are evaluated on ASVspoof 2017 corpus version 2.0 and ASVspoof 2019 physical access, respectively. Experimental results show that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can produce discriminative features toward playback speech. Further results on the two databases show that constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients can perform better than some common features, such as mel frequency cepstral coefficients and constant-Q cepstral coefficients.
引用
收藏
相关论文
共 50 条
  • [21] TRAFFIC SIGN DETECTION BASED ON SIMPLE XOR AND DISCRIMINATIVE FEATURES
    Madani, Ahmed
    Yusof, Rubiyah
    JURNAL TEKNOLOGI, 2016, 78 (6-2): : 97 - 102
  • [22] Web spam detection based on discriminative content and link features
    Mahmoudi M.
    Yari A.
    Khadivi S.
    2010 5th International Symposium on Telecommunications, IST 2010, 2010, : 542 - 546
  • [23] Robust endpoint detection for speech recognition based on discriminative feature extraction
    Yamamoto, Koichi
    Jabloun, Firas
    Reinhard, Klaus
    Kawamura, Akinori
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 805 - 808
  • [24] Speech/music classification using phase-based and magnitude-based features
    Bhattacharjee, Mrinmoy
    Prasanna, S. R. Mahadeva
    Guha, Prithwijit
    SPEECH COMMUNICATION, 2022, 142 : 34 - 48
  • [25] Weld Nut Detection Algorithm Based on LoG Features Fusion
    Luo, Baihuai
    Li, Yang
    Lin, Xiye
    Zhou, Zibin
    Computer Engineering and Applications, 2024, 60 (10) : 332 - 340
  • [26] Detection of Collaboration: Relationship Between Log and Speech-Based Classification
    Viswanathan, Sree Aurovindh
    Vanlehn, Kurt
    ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2019, PT II, 2019, 11626 : 327 - 331
  • [27] Speech Emotion Recognition Based on Transfer Emotion-Discriminative Features Subspace Learning
    Zhang, Kexin
    Liu, Yunxiang
    IEEE ACCESS, 2023, 11 : 56336 - 56343
  • [28] Speech recognition system in high noise background based on discriminative learning of environmental features
    Lu, Cheng-Guo
    Han, Ji-Qing
    Wang, Cheng-Fa
    Zhang, Lei
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2003, 35 (02): : 134 - 137
  • [29] Deriving conversation-based features from unlabeled speech for discriminative language modeling
    Karakos, D.
    Roark, B.
    Shafran, I.
    Sagae, K.
    Lehr, M.
    Prud'hommeaux, E.
    Xu, P.
    Glenn, N.
    Khudanpur, S.
    Saraclar, M.
    Bikel, D.
    Dredze, M.
    Callison-Burch, C.
    Cao, Y.
    Hall, K.
    Hasler, E.
    Koehn, P.
    Lopez, A.
    Post, M.
    Riley, D.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 202 - 205
  • [30] Modified dense convolutional networks based emotion detection from speech using its paralinguistic features
    Dhiman, Ritika
    Kang, Gurkanwal Singh
    Gupta, Varun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (21-23) : 32041 - 32069