Evaluating automatic speech recognition systems as quantitative models of cross-lingual phonetic category perception

被引:4
|
作者
Schatz, Thomas [1 ,2 ]
Bach, Francis [3 ]
Dupoux, Emmanuel [4 ]
机构
[1] Univ Maryland, Dept Linguist, College Pk, MD 20742 USA
[2] Univ Maryland, UMIACS, College Pk, MD 20742 USA
[3] PSL Res Univ, CNRS, Ecole Normale Super, Dept Informat ENS,SIERRA Project Team,INRIA, 45 Rue Ulm, F-75005 Paris, France
[4] PSL Res Univ, CNRS, Ecole Normale Super, Dept Etud Cognit ENS,EHESS,LSCP, 29 Rue Ulm, F-75005 Paris, France
来源
基金
欧洲研究理事会; 美国国家科学基金会;
关键词
JAPANESE;
D O I
10.1121/1.5037615
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Theories of cross-linguistic phonetic category perception posit that listeners perceive foreign sounds by mapping them onto their native phonetic categories, but, until now, no way to effectively implement this mapping has been proposed. In this paper, Automatic Speech Recognition systems trained on continuous speech corpora are used to provide a fully specified mapping between foreign sounds and native categories. The authors show how the machine ABX evaluation method can be used to compare predictions from the resulting quantitative models with empirically attested effects in human cross-linguistic phonetic category perception. (C) 2018 Acoustical Society of America
引用
收藏
页码:EL372 / EL378
页数:7
相关论文
共 50 条
  • [11] MAXIMUM A POSTERIORI ADAPTATION OF SUBSPACE GAUSSIAN MIXTURE MODELS FOR CROSS-LINGUAL SPEECH RECOGNITION
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4877 - 4880
  • [12] XTREME-S: Evaluating Cross-lingual Speech Representations
    Conneau, Alexis
    Bapna, Ankur
    Zhang, Yu
    Ma, Min
    von Platen, Patrick
    Lozhkov, Anton
    Cherry, Colin
    Jia, Ye
    Rivera, Clara
    Kale, Mihir
    Van Esch, Daan
    Axelrod, Vera
    Khanuja, Simran
    Clark, Jonathan H.
    Firat, Orhan
    Auli, Michael
    Ruder, Sebastian
    Riesa, Jason
    Johnson, Melvin
    [J]. INTERSPEECH 2022, 2022, : 3248 - 3252
  • [13] CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
    Neumann, Michael
    Ngoc Thang Vu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5769 - 5773
  • [14] Cross-lingual Speech Emotion Recognition through Factor Analysis
    Desplanques, Brecht
    Demuynck, Kris
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3648 - 3652
  • [15] Semi-supervised cross-lingual speech emotion recognition
    Agarla, Mirko
    Bianco, Simone
    Celona, Luigi
    Napoletano, Paolo
    Petrovsky, Alexey
    Piccoli, Flavio
    Schettini, Raimondo
    Shanin, Ivan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [16] CROSS-LINGUAL SPEECH RECOGNITION UNDER RUNTIME RESOURCE CONSTRAINTS
    Yu, Dong
    Deng, Li
    Liu, Peng
    Wu, Jian
    Gong, Yifan
    Acero, Alex
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4193 - 4196
  • [17] Cross-Lingual Acoustic modeling for Dialectal Arabic Speech Recognition
    Elmahdy, Mohamed
    Gruhn, Rainer
    Minker, Wolfgang
    Abdennadher, Slim
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 873 - +
  • [18] Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
    Gao, Heting
    Ni, Junrui
    Zhang, Yang
    Qian, Kaizhi
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    [J]. INTERSPEECH 2021, 2021, : 1304 - 1308
  • [19] Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
    Hu, Ke
    Bruguier, Antoine
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Pundak, Golan
    [J]. INTERSPEECH 2019, 2019, : 2155 - 2159
  • [20] Cross-lingual Speech Emotion Recognition System Based on a Three-Layer Model for Human Perception
    Elbarougy, Reda
    Akagi, Masato
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,