Data and Representation for Turkish Natural Language Inference

被引:0
|
作者
Budur, Emrah [1 ,2 ]
Ozcelik, Riza [2 ]
Gungor, Tunga [2 ]
Potts, Christopher [3 ]
机构
[1] Garanti BBVA Technol, Istanbul, Turkey
[2] Bogazic Univ, Istanbul, Turkey
[3] Stanford Univ, Stanford, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large annotated datasets in NLP are overwhelmingly in English. This is an obstacle to progress in other languages. Unfortunately, obtaining new annotated resources for each task in each language would be prohibitively expensive. At the same time, commercial machine translation systems are now robust. Can we leverage these systems to translate English-language datasets automatically? In this paper, we offer a positive response for natural language inference (NLI) in Turkish. We translated two large English NLI datasets into Turkish and had a team of experts validate their translation quality and fidelity to the original labels. Using these datasets, we address core issues of representation for Turkish NLI. We find that in-language embeddings are essential and that morphological parsing can be avoided where the training set is large. Finally, we show that models trained on our machine-translated datasets are successful on human-translated evaluation sets. We share all code, models, and data publicly.
引用
收藏
页码:8253 / 8267
页数:15
相关论文
共 50 条
  • [1] Natural language as the basis for meaning representation and inference
    Dagan, Ido
    Bar-Haim, Roy
    Szpektor, Idan
    Greental, Iddo
    Shnarchl, Eyal
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 151 - +
  • [2] Knowledge representation and inference for natural language processing (preface)
    Ali, SS
    Iwanska, L
    Shapiro, SC
    [J]. INTERNATIONAL JOURNAL OF EXPERT SYSTEMS, 1996, 9 (01): : 1 - 14
  • [3] Natural language inference for Malayalam language using language agnostic sentence representation
    Renjit, Sara
    Idicula, Sumam
    [J]. PEERJ COMPUTER SCIENCE, 2021,
  • [4] Natural language inference for Malayalam language using language agnostic sentence representation
    Renjit, Sara
    Idicula, Sumam
    [J]. PeerJ Computer Science, 2021, 7 : 1 - 25
  • [5] Representation and inference for natural language: A first course in computational semantics
    Pelletier, Francis Jeffry
    [J]. COMPUTATIONAL LINGUISTICS, 2006, 32 (02) : 283 - 286
  • [6] Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
    Poliak, Adam
    Haldar, Aparajita
    Rudinger, Rachel
    Hu, J. Edward
    Pavlick, Ellie
    White, Aaron Steven
    Van Durme, Benjamin
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 67 - 81
  • [7] Multilevel Image-Enhanced Sentence Representation Net for Natural Language Inference
    Zhang, Kun
    Lv, Guangyi
    Wu, Le
    Chen, Enhong
    Liu, Qi
    Wu, Han
    Xie, Xing
    Wu, Fangzhao
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (06): : 3781 - 3795
  • [8] A Deep Natural Language Inference Predictor Without Language-Specific Training Data
    Corradi, Lorenzo
    Manenti, Alessandro
    Del Bonifro, Francesca
    Setti, Francesco
    Del Sorbo, Dario
    [J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 168 - 181
  • [9] Natural Language Inference in Coq
    Chatzikyriakidis, Stergios
    Luo, Zhaohui
    [J]. JOURNAL OF LOGIC LANGUAGE AND INFORMATION, 2014, 23 (04) : 441 - 480
  • [10] Communication and inference in natural language
    da Costa, Jorge Campos
    [J]. LETRAS DE HOJE-ESTUDOS E DEBATES EM LINGUISTICA LITERATURA E LINGUA PORTUGUESA, 2005, 40 (01): : 107 - 133