Improving the Extraction of Bilingual Terminology from Wikipedia

被引:28
|
作者
Erdmann, Maike [1 ]
Nakayama, Kotaro [2 ]
Hara, Takahiro [1 ]
Nishio, Shojiro [1 ]
机构
[1] Osaka Univ, Suita, Osaka 565, Japan
[2] Univ Tokyo, Tokyo 1138654, Japan
关键词
Algorithms; Experimentation; Bilingual dictionary; Wikipedia mining; link analysis;
D O I
10.1145/1596990.1596995
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] An approach for extracting bilingual terminology from wikipedia
    Erdmann, Maike
    Nakayama, Kotaro
    Hara, Takahiro
    Nishio, Shojiro
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2008, 4947 : 380 - 392
  • [2] Meliorated Approach for Extracting Bilingual Terminology from Wikipedia
    Gupta, Anand
    Goya, Akhil
    Bindal, Aman
    Gupta, Ankuj
    2008 11TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY: ICCIT 2008, VOLS 1 AND 2, 2008, : 599 - 604
  • [3] Mutual bilingual terminology extraction
    Ha, Le An
    Fernandez, Gabriela
    Mitkov, Ruslan
    Corpas, Gloria
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1818 - 1824
  • [4] Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
    Hazem, Amir
    Morin, Emmanuel
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4184 - 4187
  • [5] Bilingual Terminology Extraction in Sketch Engine
    Baisa, Vit
    Ulipova, Barbora
    Cukr, Michal
    RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2015), 2015, : 61 - 67
  • [6] Extracting terminology from Wikipedia
    Vivaldi, Jorge
    Rodriguez, Horacio
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 65 - 73
  • [7] Bilingual Terminology Extraction based on Translation Patterns
    Simoes, Alberto
    Almeida, Jose Joao
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 281 - 288
  • [8] Bilingual Terminology Extraction from Comparable E-Commerce Corpora
    Jia, Hao
    Gu, Shuqin
    Zhang, Yuqi
    Duan, Xiangyu
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [9] Automatic Parallel Corpora and Bilingual Terminology extraction from Parallel WebSites
    Almeida, Jose Joao
    Simoes, Alberto
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 50 - 55
  • [10] Chinese terminology extraction using bilingual web resources
    Yang, Yuhang
    Lu, Qin
    Ji, Luning
    Zhao, Tiejun
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 347 - +