Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph

被引:0
|
作者
Hussain, Muhammad Jawad [1 ]
Wasti, Shahbaz Hassan [1 ,2 ]
Huang, Guangjian [1 ]
Jiang, Yuncheng [1 ]
机构
[1] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Guangdong, Peoples R China
[2] Univ Educ, Div Sci & Technol, Lahore, Pakistan
来源
DATA IN BRIEF | 2020年 / 30卷
基金
中国国家自然科学基金;
关键词
Semantic similarity; Wikipedia category graph; Multiple inheritances; Information content;
D O I
10.1016/j.dib.2020.105377
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This data article compiles the detailed and descriptive experimental data of Wikipedia-based semantic similarity approach called as Neighbourhood Aggregated Semantic Contribution (NASC), presented in Husain, et al. [1]. The JWPL (Java Wikipedia Library)-DataMachine and JWPL WikipediaAPI are used to extract the required Wikipedia features from Wikipedia dump. The dataset presents the disambiguated Wikipedia concepts of the gold standard word similarity benchmarks MC30 (English), RG65(es) (Spanish) and RG65(fr) (French) and their associated set of categories in the corresponding Wikipedia category graph (WCG). The dataset also contains the number of ancestors, common ancestors, pages, and common pages in the k-neighbourhood of the associated categories for different levels of parameter k in the English, Spanish, and French WCGs. The presented dataset can be used to assess the semantic similarity between Wikipedia concepts in English (MC30), Spanish (RG65(es)), and French (RG65(fr)) languages benchmarks. Moreover, the dataset will be useful for the further analysis and comparison of the taxonomic structures of the English, Spanish, and French WCGs. (C) 2020 The Authors. Published by Elsevier Inc.
引用
收藏
页数:9
相关论文
共 22 条
  • [1] An approach for measuring semantic similarity between Wikipedia concepts using multiple inheritances
    Hussain, Muhammad Jawad
    Wasti, Shahbaz Hassan
    Huang, Guangjian
    Wei, Lina
    Jiang, Yuncheng
    Tang, Yong
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)
  • [2] Assessing Semantic Similarity Between Concepts Using Wikipedia Based on Nonlinear Fitting
    Huang, Guangjian
    Jiang, Yuncheng
    Ma, Wenjun
    Liu, Weiru
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 159 - 171
  • [3] Computing semantic similarity based on novel models of semantic representation using Wikipedia
    Qu, Rong
    Fang, Yongyi
    Bai, Wen
    Jiang, Yuncheng
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1002 - 1021
  • [4] Measuring Semantic Similarity between Words Using Wikipedia
    Lu Zhiqiang
    Shao Werimin
    Yu Zhenhua
    [J]. WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 251 - +
  • [5] Fuzzy Semantic Similarity in Linked Data using Wikipedia Infobox
    Zadeh, Parisa D. Hossein
    Reformat, Marek Z.
    [J]. PROCEEDINGS OF THE 2013 JOINT IFSA WORLD CONGRESS AND NAFIPS ANNUAL MEETING (IFSA/NAFIPS), 2013, : 395 - 400
  • [6] Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
    Jiang, Yuncheng
    Zhang, Xiaopei
    Tang, Yong
    Nie, Ruihua
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (03) : 215 - 234
  • [7] Computing semantic similarity between biomedical concepts using new information content approach
    Ben Aouicha, Mohamed
    Taieb, Mohamed Ali Hadj
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 59 : 258 - 275
  • [8] Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia
    Hussain, Muhammad Jawad
    Bai, Heming
    Wasti, Shahbaz Hassan
    Huang, Guangjian
    Jiang, Yuncheng
    [J]. INFORMATION SCIENCES, 2023, 625 : 673 - 699
  • [9] Wikipedia bi-linear link (WBLM) model: A new approach for measuring semantic similarity and relatedness between linguistic concepts using Wikipedia link structure
    Hussain, Muhammad Jawad
    Bai, Heming
    Jiang, Yuncheng
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [10] Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies
    Al-Mubaid, Hisham
    Nguyen, Hoa A.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2009, 39 (04): : 389 - 398