Multilingual Grammar Induction with Continuous Language Identification

被引:0
|
作者
Han, Wenjuan [1 ]
Wang, Ge [1 ]
Jiang, Yong [2 ]
Tu, Kewei [1 ]
机构
[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
[2] Alibaba Grp, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The key to multilingual grammar induction is to couple grammar parameters of different languages together by exploiting the similarity between languages. Previous work relies on linguistic phylogenetic knowledge to specify similarity between languages. In this work, we propose a novel universal grammar induction approach that represents language identities with continuous vectors and employs a neural network to predict grammar parameters based on the representation. Without any prior linguistic phylogenetic knowledge, we automatically capture similarity between languages with the vector representations and softly tie the grammar parameters of different languages. In our experiments, we apply our approach to 15 languages across 8 language families and subfamilies in the Universal Dependency Treebank dataset, and we observe substantial performance gain on average over monolingual and multilingual baselines.
引用
收藏
页码:5728 / 5733
页数:6
相关论文
共 50 条
  • [1] Annealing Structural Bias in Multilingual Weighted Grammar Induction
    Smith, Noah A.
    Eisner, Jason
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 569 - 576
  • [2] Multilingual native language identification
    Malmasi, Shervin
    Dras, Mark
    NATURAL LANGUAGE ENGINEERING, 2017, 23 (02) : 163 - 215
  • [3] IMPROVING LANGUAGE IDENTIFICATION FOR MULTILINGUAL SPEAKERS
    Titus, Andrew
    Silovsky, Jan
    Chen, Nanxin
    Hsiao, Roger
    Young, Mary
    Ghoshal, Arnab
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8284 - 8288
  • [4] Linguini: Language identification for multilingual documents
    IBM Thomas J. Watson Research Center, United States
    不详
    不详
    J Manage Inf Syst, 3 (71-101):
  • [5] Linguini: Language identification for multilingual documents
    Prager, JM
    JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 1999, 16 (03) : 71 - 101
  • [6] A multilingual FrameNet-based grammar and lexicon for controlled natural language
    Normunds Gruzitis
    Dana Dannélls
    Language Resources and Evaluation, 2017, 51 : 37 - 66
  • [7] A multilingual FrameNet-based grammar and lexicon for controlled natural language
    Gruzitis, Normunds
    Dannells, Dana
    LANGUAGE RESOURCES AND EVALUATION, 2017, 51 (01) : 37 - 66
  • [8] VLGrammar: Grounded Grammar Induction of Vision and Language
    Hong, Yining
    Li, Qing
    Zhu, Song-Chun
    Huang, Siyuan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1645 - 1654
  • [9] Variational Bayesian grammar induction for natural language
    Kurihara, Kenichi
    Sato, Taisuke
    GRAMMATICAL INFERENCE: ALGORITHMS AND APPLICATIONS, PROCEEDINGS, 2006, 4201 : 84 - 96
  • [10] LanideNN: Multilingual Language Identification on Character Window
    Kocmi, Tom
    Bojar, Ondrej
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 927 - 936