Completing features for author name disambiguation (AND): an empirical analysis

被引:0
|
作者
Humaira Waqas
Abdul Qadir
机构
[1] Capital University of Science and Technology,
来源
Scientometrics | 2022年 / 127卷
关键词
Digital libraries; Author name disambiguation; AND; AND datasets;
D O I
暂无
中图分类号
学科分类号
摘要
This study presents a feature enriched AND dataset to develop diverse and better performance achieving AND techniques, by utilizing AND features which have better discriminating abilities to solve this problem. Current AND datasets have limited number of useful AND features in them, some of them have been curated keeping in mind specific scenarios or contexts and some of them are domain specific. Rather than limiting the labelled datasets to be domain specific, contextual or hold limited feature values, it is better to leave their usage limit as a choice with respect to the technique which is trying to solve this problem. In this paper, our proposed labelled dataset “CustAND” provides a set of 7886 publication records, where each record covers more than eleven useful features values. The dataset covers multi domains as well as different ethnical group authors. CustAND is collected from multiple web sources, where raw data is extracted from digital libraries and search engines. This data is later cross checked, hand labelled and confirmed (authorship confirmation) by a team of graduate students with 100% accuracy. The raw data after pre-processing is validated by checking author’s personal web pages, different profile pages, their affiliations, and emails. This new dataset complements the availability of useful feature values which are crucial in developing generic and better performance achieving techniques to solve the author’s name ambiguity problem generally faced by the digital libraries.
引用
收藏
页码:1039 / 1063
页数:24
相关论文
共 50 条
  • [41] NameClarifier: A Visual Analytics System for Author Name Disambiguation
    Shen, Qiaomu
    Wu, Tongshuang
    Yang, Haiyan
    Wu, Yanhong
    Qu, Huamin
    Cui, Weiwei
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (01) : 141 - 150
  • [42] A Relevance Feedback Approach for the Author Name Disambiguation Problem
    Godoi, Thiago A.
    Torres, Ricardo da S.
    Carvalho, Ariadne M. B. R.
    Goncalves, Marcos Andre
    Ferreira, Anderson A.
    Fan, Weiguo
    Fox, Edward A.
    JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 209 - 218
  • [43] Large Scale Author Name Disambiguation in Digital Libraries
    Khabsa, Madian
    Treeratpituk, Pucktada
    Giles, C. Lee
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [44] A Multi-Level Author Name Disambiguation Algorithm
    Zhang, Siyang
    Xinhua, E.
    Pan, Tian
    IEEE ACCESS, 2019, 7 : 104250 - 104257
  • [45] A Web Service for Author Name Disambiguation in Scholarly Databases
    Kim, Kunho
    Sefid, Athar
    Weinberg, Bruce A.
    Giles, C. Lee
    2018 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (IEEE ICWS 2018), 2018, : 265 - 273
  • [46] A Brief Survey of Automatic Methods for Author Name Disambiguation
    Ferreira, Anderson A.
    Goncalves, Marcos Andre
    Laender, Alberto H. F.
    SIGMOD RECORD, 2012, 41 (02) : 15 - 26
  • [47] Anddy: A System for Author Name Disambiguation in Digital Library
    Zhu, Jia
    Fung, Gabriel Pui Cheong
    Zhou, Xiaofang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT II, PROCEEDINGS, 2010, 5982 : 444 - 447
  • [48] A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
    Ma, Yingying
    Wu, Youlong
    Lu, Chengqiang
    ENTROPY, 2020, 22 (04)
  • [49] Whois? Deep Author Name Disambiguation Using Bibliographic Data
    Boukhers, Zeyd
    Asundi, Nagaraj Bahubali
    LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES (TPDL 2022), 2022, 13541 : 201 - 215
  • [50] Automatic identification of academic profiles using author name disambiguation
    Digiampietri, Luciano Antonio
    Ferreira, Joao Eduardo
    EM QUESTAO, 2018, 24 (02): : 37 - 54