Completing features for author name disambiguation (AND): an empirical analysis

被引：0

作者：

Humaira Waqas

Abdul Qadir

机构：

[1] Capital University of Science and Technology,

来源：

Scientometrics | 2022年 / 127卷

关键词：

Digital libraries; Author name disambiguation; AND; AND datasets;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This study presents a feature enriched AND dataset to develop diverse and better performance achieving AND techniques, by utilizing AND features which have better discriminating abilities to solve this problem. Current AND datasets have limited number of useful AND features in them, some of them have been curated keeping in mind specific scenarios or contexts and some of them are domain specific. Rather than limiting the labelled datasets to be domain specific, contextual or hold limited feature values, it is better to leave their usage limit as a choice with respect to the technique which is trying to solve this problem. In this paper, our proposed labelled dataset “CustAND” provides a set of 7886 publication records, where each record covers more than eleven useful features values. The dataset covers multi domains as well as different ethnical group authors. CustAND is collected from multiple web sources, where raw data is extracted from digital libraries and search engines. This data is later cross checked, hand labelled and confirmed (authorship confirmation) by a team of graduate students with 100% accuracy. The raw data after pre-processing is validated by checking author’s personal web pages, different profile pages, their affiliations, and emails. This new dataset complements the availability of useful feature values which are crucial in developing generic and better performance achieving techniques to solve the author’s name ambiguity problem generally faced by the digital libraries.

引用

页码：1039 / 1063

页数：24

共 50 条

[1] Completing features for author name disambiguation (AND): an empirical analysis
Waqas, Humaira
Qadir, Abdul
SCIENTOMETRICS, 2022, 127 (02) : 1039 - 1063
[2] Multiple Features Driven Author Name Disambiguation
Zhou, Qian
Chen, Wei
Wang, Weiqing
Xu, Jiajie
Zhao, Lei
2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 506 - 515
[3] Data sets for author name disambiguation: an empirical analysis and a new resource
Mueller, Mark-Christoph
Reitz, Florian
Roy, Nicolas
SCIENTOMETRICS, 2017, 111 (03) : 1467 - 1500
[4] Data sets for author name disambiguation: an empirical analysis and a new resource
Mark-Christoph Müller
Florian Reitz
Nicolas Roy
Scientometrics, 2017, 111 : 1467 - 1500
[5] Author Name Disambiguation
Smalheiser, Neil R.
Torvik, Vetle I.
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2009, 43 : 287 - 313
[6] Author Name Disambiguation in MEDLINE
Torvik, Vetle I.
Smalheiser, Neil R.
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (03)
[7] Author Name Disambiguation for PubMed
Liu, Wanli
Dogan, Rezarta Islamaj
Kim, Sun
Comeau, Donald C.
Kim, Won
Yeganova, Lana
Lu, Zhiyong
Wilbur, W. John
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2014, 65 (04) : 765 - 781
[8] NDFMF: An Author Name Disambiguation Algorithm based on the Fusion of Multiple Features
Xu, Xiaolong
Li, Yongping
Liptrott, Mark
Bessis, Nik
2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2018), VOL 2, 2018, : 187 - 190
[9] An Efficient Technique for Author Name Disambiguation
Hazra, Rima
Saha, Anomitra
Deb, Shubhra Baran
Mitra, Debasis
2016 IEEE INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN ADVANCED COMPUTING (ICCTAC), 2016,
[10] Author Name Disambiguation Based on Heterogeneous Graph
Ma, Chuang
Xia, Helong
Journal of Computers (Taiwan), 2023, 34 (04) : 41 - 52

← 1 2 3 4 5 →