That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

被引：6

作者：

Zelasko, Piotr ^{[1
]}

Moro-Velazquez, Laureano ^{[1
]}

Hasegawa-Johnson, Mark ^{[3
,4
]}

Scharenborg, Odette ^{[5
]}

Dehak, Najim ^{[1
,2
]}

机构：

[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD USA

[3] Univ Illinois, ECE Dept, Urbana, IL USA

[4] Univ Illinois, Beckman Inst, Urbana, IL USA

[5] Delft Univ Technol, Multimedia Comp Grp, Delft, Netherlands

来源：

INTERSPEECH 2020 | 2020年

关键词：

speech recognition; multilingual; crosslingual; transfer learning; zero-shot; phone recognition;

D O I：

10.21437/Interspeech.2020-2513

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some universal phonetic representations. In this work, we focus on gaining a deeper understanding of how general these representations might be, and how individual phones are getting improved in a multilingual setting. To that end, we select a phonetically diverse set of languages, and perform a series of monolingual, multilingual and crosslingual (zero-shot) experiments. The ASR is trained to recognize the International Phonetic Alphabet (IPA) token sequences. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting, where the model, among other errors, considers Javanese as a tone language. Notably, as little as 10 hours of the target language training data tremendously reduces ASR error rates. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages - an encouraging result for the low-resource speech community.

引用

页码：3705 / 3709

页数：5

共 50 条

[1] PHONETIC INFORMATION IS INTEGRATED ACROSS INTERVENING NONLINGUISTIC SOUNDS
WHALEN, DH
SAMUEL, AG
PERCEPTION & PSYCHOPHYSICS, 1985, 37 (06): : 579 - 587
[2] Editorial: The production of speech sounds across languages
Verdonschot, Rinus G.
Tamaoka, Katsuo
JAPANESE PSYCHOLOGICAL RESEARCH, 2015, 57 (01) : 1 - 3
[3] The flamenco sounds: phonetic analysis of cante beginning
de Molina Ortes, Elena Fernandez
CULTURA LENGUAJE Y REPRESENTACION-REVISTA DE ESTUDIOS CULTURALES DE LA UNIVERSITAT JAUME I, 2020, 24 : 53 - 74
[4] FAMILIAR AND LESS FAMILIAR METAPHORS - AN ANALYSIS OF INTERPRETATIONS IN 2 LANGUAGES
DAVIES, EE
BENTAHILA, A
LANGUAGE & COMMUNICATION, 1989, 9 (01) : 49 - 68
[5] Identifying bilingual semantic neural representations across languages
Buchweitz, Augusto
Shinkareva, Svetlana V.
Mason, Robert A.
Mitchell, Tom M.
Just, Marcel Adam
BRAIN AND LANGUAGE, 2012, 120 (03) : 282 - 289
[6] THE INVESTIGATION OF COMMONALITIES IN HUMAN BRAIN SEMANTIC REPRESENTATIONS ACROSS PEOPLE AND ACROSS LANGUAGES
Buchweitz, Augusto
ILHA DO DESTERRO-A JOURNAL OF ENGLISH LANGUAGE LITERATURES IN ENGLISH AND CULTURAL STUDIES, 2011, 60 : 105 - 120
[7] COMMON COGNITIVE REPRESENTATIONS OF PROGRAM CODE ACROSS TASKS AND LANGUAGES
ROBERTSON, SP
YU, CC
INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1990, 33 (03): : 343 - 360
[8] Phonetics: A Critical Analysis of Phonetic Theory and a Technic for the Practical Description of Sounds
Hultzen, Lee S.
JOURNAL OF SPEECH DISORDERS, 1946, 11 (03): : 249 - 250
[9] PHONETICS: A CRITICAL ANALYSIS OF PHONETIC THEORY AND A TECHNIC FOR THE PRACTICAL DESCRIPTION OF SOUNDS
Velten, H. V.
INTERNATIONAL JOURNAL OF AMERICAN LINGUISTICS, 1945, 11 (03) : 182 - 186
[10] Phonetics: A Critical Analysis of Phonetic Theory and a Technic for the Practical Description of Sounds
Hultzen, Lee S.
QUARTERLY JOURNAL OF SPEECH, 1944, 30 (02) : 238 - 239

← 1 2 3 4 5 →