Improving the Extraction of Bilingual Terminology from Wikipedia

被引:28
|
作者
Erdmann, Maike [1 ]
Nakayama, Kotaro [2 ]
Hara, Takahiro [1 ]
Nishio, Shojiro [1 ]
机构
[1] Osaka Univ, Suita, Osaka 565, Japan
[2] Univ Tokyo, Tokyo 1138654, Japan
关键词
Algorithms; Experimentation; Bilingual dictionary; Wikipedia mining; link analysis;
D O I
10.1145/1596990.1596995
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Bilingual Dictionary of Juridical Terminology
    Alcaraz-Varo, Enrique
    QUADERNS-REVISTA DE TRADUCCIO, 2005, 12 : 266 - 267
  • [32] Building Bilingual Parallel Corpora based on Wikipedia
    Mohammadi, Mehdi
    GhasemAghaee, Nasser
    2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, : 264 - 268
  • [33] Survey on terminology extraction from texts
    Xu, Kang
    Feng, Yifan
    Li, Qiandi
    Dong, Zhenjiang
    Wei, Jianxiang
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [34] Terminology Extraction from Log Files
    Saneifar, Hassan
    Bonniol, Stephane
    Laurent, Anne
    Poncelet, Pascal
    Roche, Mathieu
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 769 - +
  • [35] Knowledge extraction from bilingual corpora
    Somers, H
    INFORMATION EXTRACTION: TOWARDS SCALABLE, ADAPTABLE SYSTEMS, 1999, 1714 : 120 - 133
  • [36] Weakly Supervised Multilingual Causality Extraction from Wikipedia
    Hashimoto, Chikara
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2988 - 2999
  • [37] Information Extraction from Wikipedia Using Pattern Learning
    Mihaltz, Marton
    ACTA CYBERNETICA, 2010, 19 (04): : 677 - 694
  • [38] Family Matters: Company Relations Extraction from Wikipedia
    Kuznetsov, Artem
    Braslavski, Pavel
    Ivanov, Vladimir
    KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2016, 2016, 649 : 81 - 92
  • [39] A generic method for multi word extraction from Wikipedia
    Bekavac, Bozo
    Tadic, Marko
    PROCEEDINGS OF THE ITI 2008 30TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2008, : 663 - 667
  • [40] Semantic resource extraction from Wikipedia category lattice
    Collin, Olivier
    Gaillard, Benoit
    Bouraoui, Jean-Leon
    Girault, Thomas
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : E23 - E29