Speaker discrimination in humans and machines: Effects of speaking style variability

被引:1
|
作者
Afshan, Amber [1 ]
Kreiman, Jody [2 ,3 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90024 USA
[2] Univ Calif Los Angeles, Dept Head & Neck Surg, Los Angeles, CA 90024 USA
[3] Univ Calif Los Angeles, Dept Linguist, Los Angeles, CA 90024 USA
来源
关键词
speaker perception; speaking style; automatic speaker verification; human assisted speaker discrimination;
D O I
10.21437/Interspeech.2020-3004
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Does speaking style variation affect humans' ability to distinguish individuals from their voices? How do humans compare with automatic systems designed to discriminate between voices? In this paper, we attempt to answer these questions by comparing human and machine speaker discrimination performance for read speech versus casual conversations. Thirty listeners were asked to perform a same versus different speaker task. Their performance was compared to a state-of-the-art x-vector/PLDA-based automatic speaker verification system. Results showed that both humans and machines performed better with style-matched stimuli, and human performance was better when listeners were native speakers of American English. Native listeners performed better than machines in the style-matched conditions (EERs of 6.96% versus 14.35% for read speech, and 15.12% versus 19.87%, for conversations), but for style-mismatched conditions, there was no significant difference between native listeners and machines. In all conditions, fusing human responses with machine results showed improvements compared to each alone, suggesting that humans and machines have different approaches to speaker discrimination tasks. Differences in the approaches were further confirmed by examining results for individual speakers which showed that the perception of distinct and confused speakers differed between human listeners and machines.
引用
收藏
页码:3136 / 3140
页数:5
相关论文
共 50 条
  • [1] TARGET AND NON-TARGET SPEAKER DISCRIMINATION BY HUMANS AND MACHINES
    Park, Soo Jin
    Afshan, Amber
    Kreiman, Jody
    Yeung, Gary
    Alwan, Abeer
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6326 - 6330
  • [2] Speaker Recognition by Machines and Humans
    Hansen, John H. L.
    Hasan, Taufiq
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 74 - 99
  • [3] Emotional Speaker Identification by Humans and Machines
    Yang, Yingchun
    Chen, Li
    Wang, Wenyi
    [J]. BIOMETRIC RECOGNITION: CCBR 2011, 2011, 7098 : 167 - 173
  • [4] Effects of backward speech and speaker variability in language discrimination by rats
    Toro, JM
    Trobalon, JB
    Sebastián-Gallés, N
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-ANIMAL BEHAVIORAL PROCESSES, 2005, 31 (01): : 95 - 100
  • [5] Effects of Vocal Effort and Speaking Style on Text-Independent Speaker Verification
    Shriberg, Elizabeth
    Graciarena, Martin
    Bratt, Harry
    Kathol, Andreas
    Kajarekar, Sachin
    Jameel, Huda
    Richey, Colleen
    Goodman, Fred
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 609 - +
  • [6] Stimulus variability and the phonetic relevance hypothesis: Effects of variability in speaking style, fundamental frequency, and speaking rate on spoken word identification
    Sommers, MS
    Barcroft, J
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (04): : 2406 - 2416
  • [7] Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
    Afshan, Amber
    Guo, Jinxi
    Park, Soo Jin
    Ravi, Vijay
    McCree, Alan
    Alwan, Abeer
    [J]. INTERSPEECH 2020, 2020, : 4318 - 4322
  • [8] Style variability in disfluency analysis for forensic speaker comparison
    Harrington, Lauren
    Rhodes, Richard
    Hughes, Vincent
    [J]. INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2021, 28 (01) : 31 - 58
  • [9] Quantifying fundamental frequency modulation as a function of language, speaking style and speaker
    Arantes, Pablo
    Eriksson, Anders
    [J]. INTERSPEECH 2019, 2019, : 1716 - 1720
  • [10] Towards an Unsupervised Speaking Style Voice Building Framework: multi-style speaker diarization
    Lorenzo-Trueba, J.
    Martinez-Gonzalez, B.
    Lopez-Ludena, V.
    Barra-Chicote, R.
    Ferreiros, J.
    Yamagishi, J.
    Montero, J. M.
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2275 - 2278