LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification

被引:2
|
作者
Liu, Hexin [1 ]
Perera, Leibny Paola Garcia [2 ]
Khong, Andy W. H. [1 ]
Styles, Suzy J. [3 ]
Khudanpur, Sanjeev [2 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
[2] Johns Hopkins Univ, CLSP & HLT COE, Baltimore, MD 21218 USA
[3] Nanyang Technol Univ, Sch Social Sci, Psychol, Singapore, Singapore
来源
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Language identification; acoustic phonetics; phonotactics; self-supervised learning; phoneme segmentation; RECOGNITION;
D O I
10.21437/Interspeech.2022-354
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel model to hierarchically incorporate phoneme and phonotactic information for language identification (LID) without requiring phoneme annotations for training. In this model, named PHO-LID, a self-supervised phoneme segmentation task and a LID task share a convolutional neural network (CNN) module, which encodes both language identity and sequential phonemic information in the input speech to generate an intermediate sequence of "phonotactic" embeddings. These embeddings are then fed into transformer encoder layers for utterance-level LID. We call this architecture CNN-Trans. We evaluate it on AP17-OLR data and the MLS14 set of NIST LRE 2017, and show that the PHO-LID model with multi-task optimization exhibits the highest LID performance among all models, achieving over 40% relative improvement in terms of average cost on AP17-OLR data compared to a CNN-Trans model optimized only for LID. The visualized confusion matrices imply that our proposed method achieves higher performance on languages of the same cluster in NIST LRE 2017 data than the CNN-Trans model. A comparison between predicted phoneme boundaries and corresponding audio spectrograms illustrates the leveraging of phoneme information for LID.
引用
收藏
页码:2233 / 2237
页数:5
相关论文
共 50 条
  • [1] AN ENDPOINT DETECTION ALGORITHM INCORPORATING ACOUSTIC-PHONETIC KNOWLEDGE
    FLETCHER, IG
    ROONEY, E
    MCINNES, F
    JACK, MA
    PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 111 - 117
  • [2] SPANISH LANGUAGE INTERFERENCE WITH ACOUSTIC-PHONETIC SKILLS AND READING
    MATHEWSON, GC
    PEREYRASUAREZ, DM
    JOURNAL OF READING BEHAVIOR, 1975, 7 (02): : 187 - 196
  • [3] ACOUSTIC-PHONETIC FEATURE BASED DIALECT IDENTIFICATION IN HINDI SPEECH
    Sinha, Shweta
    Jain, Aruna
    Agrawal, S. S.
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2015, 8 (01) : 235 - 254
  • [4] Acoustic-phonetic unit similarities for context dependent acoustic model portability
    Le, Viet Bac
    Besacier, Laurent
    Schultz, Tanja
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1101 - 1104
  • [5] Impacts of acoustic-phonetic variability on perceptual development for spoken language: A review
    Quam, Carolyn
    Creel, Sarah C.
    WILEY INTERDISCIPLINARY REVIEWS-COGNITIVE SCIENCE, 2021, 12 (05)
  • [6] Effects of acoustic-phonetic detail on cross-language speech production
    Wilson, Colin
    Davidson, Lisa
    Martin, Sean
    JOURNAL OF MEMORY AND LANGUAGE, 2014, 77 : 1 - 24
  • [7] Using acoustic-phonetic simulations to model historical sound change
    Hudson, Toby
    Wei, Jonathan
    Coleman, John
    DIACHRONICA, 2024, 41 (03) : 355 - 378
  • [8] Acoustic-Phonetic Versus Lexical Processing in Nonnative Listeners Differing in Their Dominant Language
    Shi, Lu-Feng
    Koenig, Laura L.
    AMERICAN JOURNAL OF AUDIOLOGY, 2016, 25 (03) : 167 - 176
  • [9] An efficient phonotactic-acoustic system for language identification
    Navratil, J
    Zuhlke, W
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 781 - 784
  • [10] SPEECH-PERCEPTION - MODEL OF ACOUSTIC-PHONETIC ANALYSIS AND LEXICAL ACCESS
    KLATT, DH
    JOURNAL OF PHONETICS, 1979, 7 (03) : 279 - 312