Model Reuse in Machine Learning for Author Name Disambiguation: An Exploration of Transfer Learning

被引:1
|
作者
Kim, Jinseok [1 ,2 ]
Owen-Smith, Jason [1 ,2 ,3 ]
机构
[1] Univ Michigan, Inst Res Innovat & Sci, Ann Arbor, MI 48104 USA
[2] Univ Michigan, Inst Social Res, Survey Res Ctr, Ann Arbor, MI 48104 USA
[3] Univ Michigan, Dept Surg, Ann Arbor, MI 48104 USA
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
美国国家科学基金会;
关键词
Task analysis; Data models; Machine learning; Training; Training data; Libraries; Labeling; Data handling; data preprocessing; author name disambiguation; machine learning; transfer learning; authority control; IMPACT;
D O I
10.1109/ACCESS.2020.3031112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning for author name disambiguation is usually conducted on the training and test subsets of labeled data created for a specific task. As a result, disambiguation models learned on heterogeneous labeled data are often inapplicable for other purposes that either do not use the same labeled data or do not make use of any labeled data at all. This article explores the idea of transfer learning in a new context, author name disambiguation. We focus on cases where a disambiguation task lacking labeled training data uses models trained on labeled data generated for other tasks. For this purpose, two labeled source datasets are used for training of disambiguation models to be applied to three test target datasets that are deficient of labeled training data. Our results show that transfer learning can produce disambiguation performances similar to those achievable by traditional machine learning in which training and test datasets come from the same labeled data source. The good performance through transfer learning are possible when training source datasets have similar feature distributions as test target datasets. This study suggests that through transfer learning, rich disambiguation models in previous studies can be retained and reused across ambiguous bibliographic data from different fields and data sources, motivating further research on how to correct feature distribution differences between source and target datasets to expand the application of transfer learning in author name disambiguation beyond the model sharing explored in this research.
引用
收藏
页码:188378 / 188389
页数:12
相关论文
共 50 条
  • [41] Politics of data reuse in machine learning systems: Theorizing reuse entanglements
    Thylstrup, Nanna Bonde
    Hansen, Kristian Bondo
    Flyverbom, Mikkel
    Amoore, Louise
    [J]. BIG DATA & SOCIETY, 2022, 9 (02)
  • [42] Name Disambiguation-Learning From More User-Friendly Models
    Thomas, Bob
    [J]. CATALOGING & CLASSIFICATION QUARTERLY, 2011, 49 (03) : 223 - 232
  • [43] Author name disambiguation using a graph model with node splitting and merging based on bibliographic information
    Shin, Dongwook
    Kim, Taehwan
    Choi, Joongmin
    Kim, Jungsun
    [J]. SCIENTOMETRICS, 2014, 100 (01) : 15 - 50
  • [44] Author name disambiguation using a graph model with node splitting and merging based on bibliographic information
    Dongwook Shin
    Taehwan Kim
    Joongmin Choi
    Jungsun Kim
    [J]. Scientometrics, 2014, 100 : 15 - 50
  • [45] Safe Exploration for Interactive Machine Learning
    Turchetta, Matteo
    Berkenkamp, Felix
    Krause, Andreas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [46] Machine Learning for Gas and Oil Exploration
    Nordloh, Vito Alexander
    Roubickova, Anna
    Brown, Nick
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 3009 - 3016
  • [47] THE EXPLORATION OF A MACHINE LEARNING APPROACH FOR THE ASSESSMENT OF LEARNING STYLES CHANGES
    Wei, Yueer
    Yang, Qingxia
    Chen, Jiangping
    Hu, Jie
    [J]. MECHATRONIC SYSTEMS AND CONTROL, 2018, 46 (03): : 121 - 126
  • [48] AN EXPLORATION OF FEDERATED LEARNING FOR PRIVACY-PRESERVING MACHINE LEARNING
    Kumar, K. Kiran
    Rao, Thalakola Syamsundara
    Vullam, Nagagopiraju
    Vellela, Sai Srinivas
    Jyosthna, B.
    Farjana, Shaik
    Javvadi, Sravanthi
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [49] A Predictive Model of Learning Effectiveness in Flipped Classroom Mode: An Exploration of Higher Vocational English Learning Based on Machine Learning
    Wang, Lizhen
    [J]. Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [50] Disambiguation of author entities in ADS using supervised learning and graph theory methods
    Mihaljevic, Helena
    Santamaria, Lucia
    [J]. SCIENTOMETRICS, 2021, 126 (05) : 3893 - 3917