The Catcher in the Field: A Fieldprint based Spoofing Detection for Text-Independent Speaker Verification

被引：37

作者：

Yan, Chen ^{[1
]}

Long, Yan ^{[1
]}

Ji, Xiaoyu ^{[1
]}

Xu, Wenyuan ^{[1
]}

机构：

[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China

来源：

PROCEEDINGS OF THE 2019 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'19) | 2019年

基金：

国家重点研发计划;

关键词：

fieldprint; speaker verification; spoofing attack; sound field; SPEECH; DIRECTIVITY; RECOGNITION; NOISE;

D O I：

10.1145/3319535.3354248

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Verifying the identity of voice inputs is important as voices are increasingly used for sensitive operations. Traditional methods focus on differentiating individuals via the spectrographic features of voices (e.g., voiceprint), yet cannot cope with spoofing attacks, whereby a malicious attacker synthesizes the voice with almost the same voiceprint of a victim or simply replays it. This paper proposes CaField, a text-independent speaker verification method to detect loudspeaker:based voice spoofing attacks with the goal of achieving two seemingly conflicting requirements: usability and security. The key insight of CaField is to construct "fieldprint" with the acoustic biometrics embedded in sound fields, i.e., a physical field of acoustic energy created as the sound propagates over the air, as analogous to "voiceprint". We find that fieldprints can be distinctive between speakers (either humans or loudspeakers), and thus we may detect the speakers being used for spoofing attacks from the authentic users. Our evaluation on a dataset of 20 people and 8 loudspeakers shows that by relying on two on-board microphones to sample sound fields while users talk to the smartphones, CaField achieves a detection accuracy of 99.16% and an equal error rate (EER) of 0.85% across multiple sessions and various voice inputs. CaField supports low audio sample rates at 8 kHz and is robust to various factors including phone displacement, user posture, recording environment, etc.

引用

下载

页码：1215 / 1229

页数：15

共 50 条

[41] FACTORED COVARIANCE MODELING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Wang, Eryu
Lee, Kong Aik
Ma, Bin
Li, Haizhou
Guo, Wu
Dai, Lirong
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4856 - 4859
[42] Exploration of Local Variability in Text-Independent Speaker Verification
Liping Chen
Kong Aik Lee
Bin Ma
Wu Guo
Haizhou Li
Li-Rong Dai
Journal of Signal Processing Systems, 2016, 82 : 217 - 228
[43] Text-independent speaker verification using covariance modeling
Zilca, RD
IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (04) : 97 - 99
[44] Text-independent speaker verification with dynamic trajectory model
Xiang, B
IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (05) : 141 - 143
[45] A ROBUST TEXT-INDEPENDENT SPEAKER VERIFICATION METHOD BASED ON SPEECH SEPARATION AND DEEP SPEAKER
Zhao, Fei
Li, Hao
Zhang, Xueliang
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6101 - 6105
[46] A CORRECTIVE LEARNING APPROACH FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Wen, Yandong
Zhou, Tianyan
Singh, Rita
Raj, Bhiksha
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4894 - 4898
[47] Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Zhu, Yingke
Ko, Tom
Snyder, David
Mak, Brian
Povey, Daniel
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3573 - 3577
[48] Speaker adaptive cohort selection for Tnorm in text-independent speaker verification
Sturim, DE
Reynolds, DA
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 741 - 744
[49] Significance of Constraining Text in Limited Data Text-independent Speaker Verification
Das, Rohan Kumar
Jelil, Sarfaraz
Prasanna, S. R. Mahadeva
2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
[50] Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
Zhang, Chunlei
Koishida, Kazuhito
Hansen, John H. L.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1633 - 1644

← 1 2 3 4 5 →