Modeling Code-Switch Languages Using Bilingual Parallel Corpus

被引:0
|
作者
Lee, Grandee [1 ]
Li, Haizhou [1 ,2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Kriston AI Lab, Beijing, Peoples R China
基金
新加坡国家研究基金会;
关键词
WORD;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language modeling is the technique to estimate the probability of a sequence of words. A bilingual language model is expected to model the sequential dependency for words across languages, which is difficult due to the inherent lack of suitable training data as well as diverse syntactic structure across languages. We propose a bilingual attention language model (BALM) that simultaneously performs language modeling objective with a quasi-translation objective to model both the monolingual as well as the cross-lingual sequential dependency. The attention mechanism learns the bilingual context from a parallel corpus. BALM achieves state-of-the-art performance on the SEAME code-switch database by reducing the perplexity of 20.5% over the best-reported result. We also apply BALM in bilingual lexicon induction, and language normalization tasks to validate the idea.
引用
收藏
页码:860 / 870
页数:11
相关论文
共 50 条
  • [1] Linguistically Motivated Parallel Data Augmentation for Code-switch Language Modeling
    Lee, Grandee
    Yue, Xianghu
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 3730 - 3734
  • [2] CAN BILINGUAL 2-YEAR-OLDS CODE-SWITCH
    LANZA, E
    [J]. JOURNAL OF CHILD LANGUAGE, 1992, 19 (03) : 633 - 658
  • [3] Bilingual language mixing: Why do bilinguals code-switch?
    Heredia, RR
    Altarriba, J
    [J]. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE, 2001, 10 (05) : 164 - 168
  • [4] Code-Switch Fatigue
    Graham-Perel, Ashley
    [J]. AMERICAN JOURNAL OF NURSING, 2023, 123 (05) : 17 - 18
  • [5] Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English Corpus
    Balabel, Mohamed
    Hamed, Injy
    Abdennadher, Slim
    Ngoc Thang Vu
    Cetinoglu, Oezlem
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3973 - 3977
  • [6] Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus
    Hamed, Injy
    Elmandy, Mohamed
    Abdennadher, Slim
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3805 - 3809
  • [7] CODE-SWITCH SPEECH RESCORING WITH MONOLINGUAL DATA
    Liu, Guoyu
    Cao, Lixin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6229 - 6233
  • [8] LANGUAGE DIARIZATION FOR CODE-SWITCH CONVERSATIONAL SPEECH
    Lyu, Dau-Cheng
    Chng, Eng-Siong
    Li, Haizhou
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7314 - 7318
  • [9] Multilingual Children's Motivations to Code-Switch: A Qualitative Analysis of Code-Switching in Dutch-English Bilingual Daycares
    Sczepurek, Nina-Sophie
    Aalberse, Suzanne P.
    Verhagen, Josje
    [J]. LANGUAGES, 2022, 7 (04)
  • [10] Data Augmentation for Code-switch Language Modeling by Fusing Multiple Text Generation Methods
    Hu, Xinhui
    Zhang, Qi
    Yang, Lei
    Gu, Binbin
    Xu, Xinkang
    [J]. INTERSPEECH 2020, 2020, : 1062 - 1066