AuthCrowd: Author Name Disambiguation and Entity Matching using Crowdsourcing

被引:3
|
作者
Correia, Antonio [1 ,2 ]
Guimaraes, Diogo [1 ,2 ]
Paulino, Dennis [1 ,2 ]
Jameel, Shoaib [3 ]
Schneider, Daniel [4 ]
Fonseca, Benjamim [1 ,2 ]
Paredes, Hugo [1 ,2 ]
机构
[1] INESC TEC, Apartado 1013, Vila Real, Portugal
[2] Univ Tras Os Montes & Alto Douro, UTAD, Apartado 1013, Vila Real, Portugal
[3] Univ Essex, Sch Comp Sci & Elect Engn, Colchester Campus, Colchester, Essex, England
[4] NCE UFRJ, Tercio Pacitti Inst Comp Applicat & Res, Rio De Janeiro, Brazil
关键词
author name disambiguation; crowdsourcing; entity matching; evaluation; scientometrics; task design;
D O I
10.1109/CSCWD49262.2021.9437769
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Despite decades of research and development in named entity resolution, dealing with name ambiguity is still a challenging issue for many bibliometric-enhanced information retrieval (IR) tasks. As new bibliographic datasets are created as a result of the upward growth of publication records worldwide, more problems arise when considering the effects of errors resulting from missing data fields, duplicate entities, misspellings, extra characters, etc. As these concerns tend to be of large-scale, both the general consistency and the quality of electronic data are largely affected. This paper presents an approach to handle these name ambiguity problems through the use of crowdsourcing as a complementary means to traditional unsupervised approaches. To this end, we present "AuthCrowd", a crowdsourcing system with the ability to decompose named entity disambiguation and entity matching tasks. Experimental results on a real-world dataset of publicly available papers published in peer-reviewed venues demonstrate the potential of our proposed approach for improving author name disambiguation. The findings further highlight the importance of adopting hybrid crowd-algorithm collaboration strategies, especially for handling complexity and quantifying bias when working with large amounts of data.
引用
收藏
页码:150 / 155
页数:6
相关论文
共 50 条
  • [1] The Impact of Name-Matching and Blocking on Author Disambiguation
    Backes, Tobias
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 803 - 812
  • [2] Author Name Disambiguation
    Smalheiser, Neil R.
    Torvik, Vetle I.
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2009, 43 : 287 - 313
  • [3] Using Web Information for Author Name Disambiguation
    Pereira, Denilson Alves
    Ribeiro-Neto, Berthier
    Ziviani, Nivio
    Laender, Alberto H. F.
    Goncalves, Marcos Andre
    Ferreira, Anderson A.
    JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, : 49 - 58
  • [4] Author Name Disambiguation Using Predictive Models
    Talaba, George
    Fotache, Mann
    EDUCATION EXCELLENCE AND INNOVATION MANAGEMENT THROUGH VISION 2020, 2019, : 4703 - 4710
  • [5] Bootstrapping Active Name Disambiguation with Crowdsourcing
    Cheng, Yu
    Chen, Zhengzhang
    Wang, Jiang
    Agrawal, Ankit
    Choudhary, Alok
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1213 - 1216
  • [6] CROWDSOURCING THE NAMES-GAME: A PROTOTYPE FOR NAME DISAMBIGUATION OF AUTHOR-INVENTORS (RIP)
    den Besten, Matthijs
    Martinez, Catalina
    Maissonneuve, Nicolas
    Maraut, Stephane
    14TH INTERNATIONAL SOCIETY OF SCIENTOMETRICS AND INFORMETRICS CONFERENCE (ISSI), 2013, : 484 - 492
  • [7] Author Name Disambiguation in MEDLINE
    Torvik, Vetle I.
    Smalheiser, Neil R.
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (03)
  • [8] Author Name Disambiguation for PubMed
    Liu, Wanli
    Dogan, Rezarta Islamaj
    Kim, Sun
    Comeau, Donald C.
    Kim, Won
    Yeganova, Lana
    Lu, Zhiyong
    Wilbur, W. John
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2014, 65 (04) : 765 - 781
  • [9] Deep author name disambiguation using DBLP data
    Boukhers, Zeyd
    Asundi, Nagaraj Bahubali
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2024, 25 (03) : 431 - 441
  • [10] Author Name Disambiguation by Using Deep Neural Network
    Hung Nghiep Tran
    Tin Huynh
    Tien Do
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT 1, 2014, 8397 : 123 - 132