Predicting Links on Wikipedia with Anchor Text Information

被引:0
|
作者
Brochier, Robin [1 ]
Bechet, Frederic [1 ]
机构
[1] Aix Marseille Univ, Univ Toulon, CNRS, LIS, Marseille, France
关键词
Wikipedia; link prediction; evaluation; hyperlinks; NETWORKS;
D O I
10.1145/3404835.3462994
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.
引用
收藏
页码:1758 / 1762
页数:5
相关论文
共 50 条
  • [41] Predicting the Popularity of Trending Arabic Wikipedia Articles Based on External Stimulants Using Data/Text Mining Techniques
    AL-Mutairi, Hanadi Muqbil
    Khan, Mohammad Badruddin
    [J]. 2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (ICCC), 2015, : 295 - 300
  • [42] Arabic text categorization based on arabic wikipedia
    [J]. Yahya, A. (yahya@birzeit.edu), 1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (13):
  • [43] Text Clustering Based on Granular Computing and Wikipedia
    Jing, Liping
    Yu, Jian
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2011, 6954 : 679 - 688
  • [44] MultiWiki: Interlingual Text Passage Alignment in Wikipedia
    Gottschalk, Simon
    Demidova, Elena
    [J]. ACM TRANSACTIONS ON THE WEB, 2017, 11 (01)
  • [45] Building a Text Classifier by a Keyword and Wikipedia Knowledge
    Qiu, Qiang
    Zhang, Yang
    Zhu, Junping
    Qu, Wei
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 277 - 287
  • [46] Wikipedia-based Kernels for Text Categorization
    Minier, Zsolt
    Bodo, Zaldn
    Csato, Lehel
    [J]. NINTH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, PROCEEDINGS, 2007, : 157 - 164
  • [47] Collective Annotation of Wikipedia Entities in Web Text
    Kulkarni, Sayali
    Singh, Amit
    Ramakrishnan, Ganesh
    Chakrabarti, Soumen
    [J]. KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 457 - 465
  • [48] The number of Wikipedia articles in a language and the links to social factors
    Pellejero, Borja
    Sorolla, Natxo
    Nogue, Marina
    [J]. DIGITHUM, 2011, (13): : 37 - 49
  • [49] Eliminating Incorrect Cross-Language Links in Wikipedia
    Bennacer, Nacera
    Bugiotti, Francesca
    Galicia, Jorge
    Patricio, Mariana
    Quercini, Gianluca
    [J]. WEB INFORMATION SYSTEMS ENGINEERING, WISE 2017, PT II, 2017, 10570 : 109 - 116
  • [50] Wikipedia Based Short Text Classification Method
    Li, Junze
    Cai, Yi
    Cai, Zhiwei
    Leung, Hofung
    Yang, Kai
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), 2017, 10179 : 275 - 286