Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages

被引:0
|
作者
Chang, Tyler A. [1 ]
Rafferty, Anna N. [1 ]
机构
[1] Carleton Coll, Northfield, MN 55057 USA
关键词
VERB BIAS; COMPREHENSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We train neural machine translation (NMT) models from English to six target languages, using NMT encoder representations to predict ancestor constituent labels of source language words. We find that NMT encoders learn similar source syntax regardless of NMT target language, relying on explicit morphosyntactic cues to extract syntactic features from source sentences. Furthermore, the NMT encoders outperform RNNs trained directly on several of the constituent label prediction tasks, suggesting that NMT encoder representations can be used effectively for natural language tasks involving syntax. However, both the NMT encoders and the directly-trained RNNs learn substantially different syntactic information from a probabilistic context-free grammar (PCFG) parser. Despite lower overall accuracy scores, the PCFG often performs well on sentences for which the RNN-based models perform poorly, suggesting that RNN architectures are constrained in the types of syntax they can learn.
引用
收藏
页码:7 / 16
页数:10
相关论文
共 35 条
  • [1] Promoting the Knowledge of Source Syntax in Transformer NMT
    Thuong-Hai Pham
    Machacek, Dominik
    Bojar, Ondrej
    [J]. COMPUTACION Y SISTEMAS, 2019, 23 (03): : 923 - 934
  • [2] The Syntax of Nominalizations across Languages and Frameworks
    Nicolae, Alexandru
    [J]. REVUE ROUMAINE DE LINGUISTIQUE-ROMANIAN REVIEW OF LINGUISTICS, 2011, 56 (03): : 301 - 304
  • [3] Learning to Translate with Source and Target Syntax
    Chiang, David
    [J]. ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 1443 - 1452
  • [4] Identifying bilingual semantic neural representations across languages
    Buchweitz, Augusto
    Shinkareva, Svetlana V.
    Mason, Robert A.
    Mitchell, Tom M.
    Just, Marcel Adam
    [J]. BRAIN AND LANGUAGE, 2012, 120 (03) : 282 - 289
  • [5] THE INVESTIGATION OF COMMONALITIES IN HUMAN BRAIN SEMANTIC REPRESENTATIONS ACROSS PEOPLE AND ACROSS LANGUAGES
    Buchweitz, Augusto
    [J]. ILHA DO DESTERRO-A JOURNAL OF ENGLISH LANGUAGE LITERATURES IN ENGLISH AND CULTURAL STUDIES, 2011, 60 : 105 - 120
  • [6] Deep Learning Similarities from Different Representations of Source Code
    Tufano, Michele
    Watson, Cody
    Bavota, Gabriele
    Di Penta, Massimiliano
    White, Martin
    Poshyvanyk, Denys
    [J]. 2018 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR), 2018, : 542 - 553
  • [7] COMMON COGNITIVE REPRESENTATIONS OF PROGRAM CODE ACROSS TASKS AND LANGUAGES
    ROBERTSON, SP
    YU, CC
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1990, 33 (03): : 343 - 360
  • [8] That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages
    Zelasko, Piotr
    Moro-Velazquez, Laureano
    Hasegawa-Johnson, Mark
    Scharenborg, Odette
    Dehak, Najim
    [J]. INTERSPEECH 2020, 2020, : 3705 - 3709
  • [9] Similarities and differences in the neural representations of abstract concepts across English and Mandarin
    Vargas, Robert
    Just, Marcel Adam
    [J]. HUMAN BRAIN MAPPING, 2022, 43 (10) : 3195 - 3206
  • [10] Novelty Detection Across Different Source Types and Languages
    Schanda, Johannes
    [J]. PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 855 - 855