Creating Large-Scale Multilingual Cognate Tables

被引：0

作者：

Wu, Winston ^{[1
]}

Yarowsky, David ^{[1
]}

机构：

[1] Johns Hopkins Univ, Dept Comp Sci, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018) | 2018年

关键词：

cognates; clustering; transliteration;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Low-resource languages often suffer from a lack of high-coverage lexical resources. In this paper, we propose a method to generate cognate tables by clustering words from existing lexical resources. We then employ character-based machine translation methods in solving the task of cognate chain completion by inducing missing word translations from lower-coverage dictionaries to fill gaps in the cognate chain, finding improvements over single language pair baselines when employing simple but novel multi-language system combination on the Romance and Turkic language families. For the Romance family, we show that system combination using the results of clustering outperforms weights derived from the historical-linguistic scholarship on language phylogenies. Our approach is applicable to any language family and has not been previously performed at such scale. The cognate tables are released to the research community.

引用

页码：3411 / 3418

页数：8

共 50 条

[21] A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions
Berger, Uri
Frermann, Lea
Stanovsky, Gabriel
Abend, Omri
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2285 - 2299
[22] Romanization-based Large-scale Adaptation of Multilingual Language Models
Purkayastha, Sukannya
Ruder, Sebastian
Pfeiffer, Jonas
Gurevych, Iryna
Vulic, Ivan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7996 - 8005
[23] DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
Bruemmer, Martin
Dojchinovski, Milan
Hellmann, Sebastian
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3339 - 3343
[24] Complura: Exploring and Leveraging a Large-scale Multilingual Visual Sentiment Ontology
Liu, Hongyi
Jou, Brendan
Chen, Tao
Topkara, Mercan
Pappas, Nikolaos
Redi, Miriam
Chang, Shih-Fu
ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 417 - 420
[25] SCALING END-TO-END MODELS FOR LARGE-SCALE MULTILINGUAL ASR
Li, Bo
Pang, Ruoming
Sainath, Tara N.
Gulati, Anmol
Zhang, Yu
Qin, James
Haghani, Parisa
Huang, W. Ronny
Ma, Min
Bai, Junwen
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1011 - 1018
[26] DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia
Lehmann, Jens
Isele, Robert
Jakob, Max
Jentzsch, Anja
Kontokostas, Dimitris
Mendes, Pablo N.
Hellmann, Sebastian
Morsey, Mohamed
van Kleef, Patrick
Auer, Soeren
Bizer, Christian
SEMANTIC WEB, 2015, 6 (02) : 167 - 195
[27] Volumetric bioprinting strategies for creating large-scale tissues and organs
Daekeun Kim
Dayoon Kang
Donghwan Kim
Jinah Jang
MRS Bulletin, 2023, 48 : 657 - 667
[28] Creating A Large-Scale Financial News Corpus for Relation Extraction
Wu, Haoyu
Lei, Qing
Zhang, Xinyue
Luo, Zhengqian
2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 259 - 263
[29] Creating national weights for a large-scale, patient longitudinal database
Baser, O.
Polingo, L.
Schaeffer, J.
Maguire, J.
Mummidi, V
VALUE IN HEALTH, 2008, 11 (03) : A174 - A174
[30] Using Movie Subtitles for Creating a Large-Scale Bilingual Corpora
Itamar, Einav
Itai, Alon
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 269 - 272

← 1 2 3 4 5 →