Creating Large-Scale Multilingual Cognate Tables

被引:0
|
作者
Wu, Winston [1 ]
Yarowsky, David [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
cognates; clustering; transliteration;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Low-resource languages often suffer from a lack of high-coverage lexical resources. In this paper, we propose a method to generate cognate tables by clustering words from existing lexical resources. We then employ character-based machine translation methods in solving the task of cognate chain completion by inducing missing word translations from lower-coverage dictionaries to fill gaps in the cognate chain, finding improvements over single language pair baselines when employing simple but novel multi-language system combination on the Romance and Turkic language families. For the Romance family, we show that system combination using the results of clustering outperforms weights derived from the historical-linguistic scholarship on language phylogenies. Our approach is applicable to any language family and has not been previously performed at such scale. The cognate tables are released to the research community.
引用
收藏
页码:3411 / 3418
页数:8
相关论文
共 50 条
  • [21] A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions
    Berger, Uri
    Frermann, Lea
    Stanovsky, Gabriel
    Abend, Omri
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2285 - 2299
  • [22] Romanization-based Large-scale Adaptation of Multilingual Language Models
    Purkayastha, Sukannya
    Ruder, Sebastian
    Pfeiffer, Jonas
    Gurevych, Iryna
    Vulic, Ivan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7996 - 8005
  • [23] DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
    Bruemmer, Martin
    Dojchinovski, Milan
    Hellmann, Sebastian
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3339 - 3343
  • [24] Complura: Exploring and Leveraging a Large-scale Multilingual Visual Sentiment Ontology
    Liu, Hongyi
    Jou, Brendan
    Chen, Tao
    Topkara, Mercan
    Pappas, Nikolaos
    Redi, Miriam
    Chang, Shih-Fu
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 417 - 420
  • [25] SCALING END-TO-END MODELS FOR LARGE-SCALE MULTILINGUAL ASR
    Li, Bo
    Pang, Ruoming
    Sainath, Tara N.
    Gulati, Anmol
    Zhang, Yu
    Qin, James
    Haghani, Parisa
    Huang, W. Ronny
    Ma, Min
    Bai, Junwen
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1011 - 1018
  • [26] DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia
    Lehmann, Jens
    Isele, Robert
    Jakob, Max
    Jentzsch, Anja
    Kontokostas, Dimitris
    Mendes, Pablo N.
    Hellmann, Sebastian
    Morsey, Mohamed
    van Kleef, Patrick
    Auer, Soeren
    Bizer, Christian
    SEMANTIC WEB, 2015, 6 (02) : 167 - 195
  • [27] Volumetric bioprinting strategies for creating large-scale tissues and organs
    Daekeun Kim
    Dayoon Kang
    Donghwan Kim
    Jinah Jang
    MRS Bulletin, 2023, 48 : 657 - 667
  • [28] Creating A Large-Scale Financial News Corpus for Relation Extraction
    Wu, Haoyu
    Lei, Qing
    Zhang, Xinyue
    Luo, Zhengqian
    2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 259 - 263
  • [29] Creating national weights for a large-scale, patient longitudinal database
    Baser, O.
    Polingo, L.
    Schaeffer, J.
    Maguire, J.
    Mummidi, V
    VALUE IN HEALTH, 2008, 11 (03) : A174 - A174
  • [30] Using Movie Subtitles for Creating a Large-Scale Bilingual Corpora
    Itamar, Einav
    Itai, Alon
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 269 - 272