Discovering Missing Wikipedia Inter-language Links by means of Cross-lingual Word Sense Disambiguation

被引:0
|
作者
Lefever, Els [1 ,2 ]
Hoste, Veronique [1 ,3 ]
De Cock, Martine [2 ]
机构
[1] Univ Coll Ghent, LT3, Groot Brittannielaan 45, B-9000 Ghent, Belgium
[2] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium
[3] Univ Ghent, Dept Linguist, B-9000 Ghent, Belgium
关键词
Wikipedia links; Cross-lingual WSD; Word Sense Disambiguation;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Wikipedia pages typically contain inter-language links to the corresponding pages in other languages. These links, however, are often incomplete. This paper describes a set of experiments in which the viability of discovering such missing inter-language links for ambiguous nouns by means of a cross-lingual Word Sense Disambiguation approach is investigated. The input for the inter-language link detection system is a set of Dutch pages for a given ambiguous noun and the output of the system is a set of links to the corresponding pages in three target languages (viz. French, Spanish and Italian). The experimental results show that although it is a very challenging task, the system succeeds to detect missing inter-language links between Wikipedia documents for a manually labeled test set. The final goal of the system is to provide a human editor with a list of possible missing links that should be manually verified.
引用
收藏
页码:841 / 846
页数:6
相关论文
共 30 条
  • [21] Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-language Information Retrieval
    Clough, Paul
    Stevenson, Mark
    GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 97 - 105
  • [22] Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language Modelling
    Boudreau, Jeremie
    Patra, Akankshya
    Suvarna, Ashima
    Cook, Paul
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2736 - 2745
  • [23] Developing a Cross-lingual Semantic Word Similarity Corpus for English-Urdu Language Pair
    Fatima, Ghazeefa
    Nawab, Rao Muhammad Adeel
    Khan, Muhammad Salman
    Saeed, Ali
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [24] Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
    Chi, Zewen
    Dong, Li
    Zheng, Bo
    Huang, Shaohan
    Mao, Xian-Ling
    Huang, Heyan
    Wei, Furu
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3418 - 3430
  • [25] Resolving Malay Word Sense Disambiguation Utilizing Cross-Language Learning Sources Approach
    Yahaya, Fuad
    Abd Rahman, Nurazzah
    Abu Bakar, Zainab
    ADVANCED SCIENCE LETTERS, 2017, 23 (11) : 11320 - 11324
  • [26] Cross-Language Information Filtering: Word Sense Disambiguation vs. Distributional Models
    Musto, Cataldo
    Narducci, Fedelucio
    Basile, Pierpaolo
    Lops, Pasquale
    de Gemmis, Marco
    Semeraro, Giovanni
    AI(STAR)IA 2011: ARTIFICIAL INTELLIGENCE AROUND MAN AND BEYOND, 2011, 6934 : 250 - 261
  • [27] IXA at CLEF 2008 Robust-WSD Task: Using Word Sense Disambiguation for (Cross Lingual) Information Retrieval
    Agirre, Eneko
    Otegi, Arantxa
    Rigau, German
    EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 118 - 125
  • [28] Identifying Cognates in English-Dutch and French-Dutch by means of Orthographic Information and Cross-lingual Word Embeddings
    Lefever, Els
    Labat, Sofie
    Singh, Pranaydeep
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4096 - 4101
  • [29] Cross-lingual numerical distance priming with second-language number words in native- to third-language number word translation
    Duyck, Wouter
    Depestel, Isabel
    Fias, Wim
    Reynvoet, Bert
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2008, 61 (09): : 1281 - 1290
  • [30] I2KD-SLU: An Intra-Inter Knowledge Distillation Framework for Zero-Shot Cross-Lingual Spoken Language Understanding
    Mao, Tianjun
    Zhang, Chenghong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII, 2023, 14261 : 345 - 356