Completing features for author name disambiguation (AND): an empirical analysis

被引:0
|
作者
Humaira Waqas
Abdul Qadir
机构
[1] Capital University of Science and Technology,
来源
Scientometrics | 2022年 / 127卷
关键词
Digital libraries; Author name disambiguation; AND; AND datasets;
D O I
暂无
中图分类号
学科分类号
摘要
This study presents a feature enriched AND dataset to develop diverse and better performance achieving AND techniques, by utilizing AND features which have better discriminating abilities to solve this problem. Current AND datasets have limited number of useful AND features in them, some of them have been curated keeping in mind specific scenarios or contexts and some of them are domain specific. Rather than limiting the labelled datasets to be domain specific, contextual or hold limited feature values, it is better to leave their usage limit as a choice with respect to the technique which is trying to solve this problem. In this paper, our proposed labelled dataset “CustAND” provides a set of 7886 publication records, where each record covers more than eleven useful features values. The dataset covers multi domains as well as different ethnical group authors. CustAND is collected from multiple web sources, where raw data is extracted from digital libraries and search engines. This data is later cross checked, hand labelled and confirmed (authorship confirmation) by a team of graduate students with 100% accuracy. The raw data after pre-processing is validated by checking author’s personal web pages, different profile pages, their affiliations, and emails. This new dataset complements the availability of useful feature values which are crucial in developing generic and better performance achieving techniques to solve the author’s name ambiguity problem generally faced by the digital libraries.
引用
收藏
页码:1039 / 1063
页数:24
相关论文
共 50 条
  • [31] Use of ResearchGate and Google CSE for author name disambiguation
    Mehmet Ali Abdulhayoglu
    Bart Thijs
    Scientometrics, 2017, 111 : 1965 - 1985
  • [32] Incremental Author Name Disambiguation for Scientific Citation Data
    Zhao, Zhengqiao
    Rollins, Jason
    Bai, Linge
    Rosen, Gail
    2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2017, : 175 - 183
  • [33] Exploring author name disambiguation on PubMed-scale
    Song, Min
    Kim, Erin Hea-Jin
    Kim, Ha Jin
    JOURNAL OF INFORMETRICS, 2015, 9 (04) : 924 - 941
  • [34] Network based framework for author name disambiguation applications
    Liu, Yuechang
    Tang, Yong
    International Journal of Future Generation Communication and Networking, 2015, 8 (09): : 75 - 82
  • [35] Author name disambiguation in scientific collaboration and mobility cases
    Wu, Jiang
    Ding, Xiu-Hao
    SCIENTOMETRICS, 2013, 96 (03) : 683 - 697
  • [36] Online author name disambiguation in evolving digital library
    Pooja, K. M.
    Mondal, Samrat
    Chandra, Joydeep
    NEUROCOMPUTING, 2022, 493 : 1 - 14
  • [37] Towards Effective Author Name Disambiguation by Hybrid Attention
    Zhou, Qian
    Chen, Wei
    Zhao, Peng-Peng
    Liu, An
    Xu, Jia-Jie
    Qu, Jian-Feng
    Zhao, Lei
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 929 - 950
  • [38] An Unsupervised Heuristic Based Approach for Author Name Disambiguation
    Pooja, K. M.
    Mondal, Samrat
    Chandra, Joydeep
    2018 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2018, : 540 - 542
  • [39] Author Name Disambiguation Based on Heterogeneous Information Network
    Qiping D.
    Weijing C.
    Ling J.
    Yu’e Z.
    Data Analysis and Knowledge Discovery, 2022, 6 (04) : 60 - 68
  • [40] Dynamic author name disambiguation for growing digital libraries
    Yanan Qian
    Qinghua Zheng
    Tetsuya Sakai
    Junting Ye
    Jun Liu
    Information Retrieval Journal, 2015, 18 : 379 - 412