Multilingual Grammar Induction with Continuous Language Identification

被引:0
|
作者
Han, Wenjuan [1 ]
Wang, Ge [1 ]
Jiang, Yong [2 ]
Tu, Kewei [1 ]
机构
[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
[2] Alibaba Grp, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The key to multilingual grammar induction is to couple grammar parameters of different languages together by exploiting the similarity between languages. Previous work relies on linguistic phylogenetic knowledge to specify similarity between languages. In this work, we propose a novel universal grammar induction approach that represents language identities with continuous vectors and employs a neural network to predict grammar parameters based on the representation. Without any prior linguistic phylogenetic knowledge, we automatically capture similarity between languages with the vector representations and softly tie the grammar parameters of different languages. In our experiments, we apply our approach to 15 languages across 8 language families and subfamilies in the Universal Dependency Treebank dataset, and we observe substantial performance gain on average over monolingual and multilingual baselines.
引用
收藏
页码:5728 / 5733
页数:6
相关论文
共 50 条
  • [31] Fine-grained Language Identification with Multilingual CapsNet Model
    Verma, Mudit
    Buduru, Arun Balaji
    2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2020), 2020, : 94 - 102
  • [32] The Problems of Language Identification within Hugely Multilingual Data Sets
    Xia, Fei
    Lewis, Carrie
    Lewis, William D.
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2790 - 2797
  • [33] Multilingual Offensive Language Identification for Low-resource Languages
    Ranasinghe, Tharindu
    Zampieri, Marcos
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [34] Language identification of multilingual posts from Twitter: a case study
    Pla, Ferran
    Hurtado, Lluis-F.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 51 (03) : 965 - 989
  • [35] A Text-to-Text Model for Multilingual Offensive Language Identification
    Ranasinghe, Tharindu
    Zampieri, Marcos
    13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 375 - 384
  • [36] Multilingual Offensive Language Identification with Cross-lingual Embeddings
    Ranasinghe, Tharindu
    Zampieri, Marcos
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5838 - 5844
  • [37] Extrapolating Multilingual Language Understanding Models as Multilingual Language Generators
    Wu, Bohong
    Yuan, Fei
    Zhao, Hai
    Li, Lei
    Xu, Jingjing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15432 - 15444
  • [38] The Castilian Grammar of Antonio de Nebrija: Grammar of a language, language of a grammar
    Ridruejo, Emilio
    REVUE DE LINGUISTIQUE ROMANE, 2015, 79 (315): : 542 - 548
  • [39] Natural language grammar induction using a constituent-context model
    Klein, D
    Manning, CD
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 35 - 42
  • [40] Natural language grammar induction with a generative constituent-context model
    Klein, D
    Manning, CD
    PATTERN RECOGNITION, 2005, 38 (09) : 1407 - 1419