Discriminative features based on modified log magnitude spectrum for playback speech detection

被引：0

作者：

Jichen Yang

Longting Xu

Bo Ren

Yunyun Ji

机构：

[1] Department of Electrical and Computer Engineering,

[2] National University of Singapore,undefined

[3] College of Information Science and Technology,undefined

[4] Donghua University,undefined

[5] Microsoft Search Technology Center Asia,undefined

[6] Electronics and Information School,undefined

[7] Nantong University,undefined

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2020卷

关键词：

Discriminative feature; Playback attack detection; Modified log magnitude spectrum; Constant-Q variance-based octave coefficients; Constant-Q mean-based octave coefficients;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients, are proposed for playback speech detection in this work. They rely on our findings that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can enhance the discriminative power between genuine speech and playback speech. Then constant-Q variance-based octave coefficients (constant-Q mean-based octave coefficients) can be obtained by combining variance-based modified log magnitude spectrum (mean-based modified log magnitude spectrum), octave segmentation, and discrete cosine transform. Finally, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients are evaluated on ASVspoof 2017 corpus version 2.0 and ASVspoof 2019 physical access, respectively. Experimental results show that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can produce discriminative features toward playback speech. Further results on the two databases show that constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients can perform better than some common features, such as mel frequency cepstral coefficients and constant-Q cepstral coefficients.

引用

共 50 条

[21] TRAFFIC SIGN DETECTION BASED ON SIMPLE XOR AND DISCRIMINATIVE FEATURES
Madani, Ahmed
Yusof, Rubiyah
JURNAL TEKNOLOGI, 2016, 78 (6-2): : 97 - 102
[22] Web spam detection based on discriminative content and link features
Mahmoudi M.
Yari A.
Khadivi S.
2010 5th International Symposium on Telecommunications, IST 2010, 2010, : 542 - 546
[23] Robust endpoint detection for speech recognition based on discriminative feature extraction
Yamamoto, Koichi
Jabloun, Firas
Reinhard, Klaus
Kawamura, Akinori
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 805 - 808
[24] Speech/music classification using phase-based and magnitude-based features
Bhattacharjee, Mrinmoy
Prasanna, S. R. Mahadeva
Guha, Prithwijit
SPEECH COMMUNICATION, 2022, 142 : 34 - 48
[25] Weld Nut Detection Algorithm Based on LoG Features Fusion
Luo, Baihuai
Li, Yang
Lin, Xiye
Zhou, Zibin
Computer Engineering and Applications, 2024, 60 (10) : 332 - 340
[26] Detection of Collaboration: Relationship Between Log and Speech-Based Classification
Viswanathan, Sree Aurovindh
Vanlehn, Kurt
ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2019, PT II, 2019, 11626 : 327 - 331
[27] Speech Emotion Recognition Based on Transfer Emotion-Discriminative Features Subspace Learning
Zhang, Kexin
Liu, Yunxiang
IEEE ACCESS, 2023, 11 : 56336 - 56343
[28] Speech recognition system in high noise background based on discriminative learning of environmental features
Lu, Cheng-Guo
Han, Ji-Qing
Wang, Cheng-Fa
Zhang, Lei
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2003, 35 (02): : 134 - 137
[29] Deriving conversation-based features from unlabeled speech for discriminative language modeling
Karakos, D.
Roark, B.
Shafran, I.
Sagae, K.
Lehr, M.
Prud'hommeaux, E.
Xu, P.
Glenn, N.
Khudanpur, S.
Saraclar, M.
Bikel, D.
Dredze, M.
Callison-Burch, C.
Cao, Y.
Hall, K.
Hasler, E.
Koehn, P.
Lopez, A.
Post, M.
Riley, D.
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 202 - 205
[30] Modified dense convolutional networks based emotion detection from speech using its paralinguistic features
Dhiman, Ritika
Kang, Gurkanwal Singh
Gupta, Varun
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (21-23) : 32041 - 32069

← 1 2 3 4 5 →