Model Reuse in Machine Learning for Author Name Disambiguation: An Exploration of Transfer Learning

被引:1
|
作者
Kim, Jinseok [1 ,2 ]
Owen-Smith, Jason [1 ,2 ,3 ]
机构
[1] Univ Michigan, Inst Res Innovat & Sci, Ann Arbor, MI 48104 USA
[2] Univ Michigan, Inst Social Res, Survey Res Ctr, Ann Arbor, MI 48104 USA
[3] Univ Michigan, Dept Surg, Ann Arbor, MI 48104 USA
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
美国国家科学基金会;
关键词
Task analysis; Data models; Machine learning; Training; Training data; Libraries; Labeling; Data handling; data preprocessing; author name disambiguation; machine learning; transfer learning; authority control; IMPACT;
D O I
10.1109/ACCESS.2020.3031112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning for author name disambiguation is usually conducted on the training and test subsets of labeled data created for a specific task. As a result, disambiguation models learned on heterogeneous labeled data are often inapplicable for other purposes that either do not use the same labeled data or do not make use of any labeled data at all. This article explores the idea of transfer learning in a new context, author name disambiguation. We focus on cases where a disambiguation task lacking labeled training data uses models trained on labeled data generated for other tasks. For this purpose, two labeled source datasets are used for training of disambiguation models to be applied to three test target datasets that are deficient of labeled training data. Our results show that transfer learning can produce disambiguation performances similar to those achievable by traditional machine learning in which training and test datasets come from the same labeled data source. The good performance through transfer learning are possible when training source datasets have similar feature distributions as test target datasets. This study suggests that through transfer learning, rich disambiguation models in previous studies can be retained and reused across ambiguous bibliographic data from different fields and data sources, motivating further research on how to correct feature distribution differences between source and target datasets to expand the application of transfer learning in author name disambiguation beyond the model sharing explored in this research.
引用
收藏
页码:188378 / 188389
页数:12
相关论文
共 50 条
  • [1] The impact of imbalanced training data on machine learning for author name disambiguation
    Jinseok Kim
    Jenna Kim
    [J]. Scientometrics, 2018, 117 : 511 - 526
  • [2] The impact of imbalanced training data on machine learning for author name disambiguation
    Kim, Jinseok
    Kim, Jenna
    [J]. SCIENTOMETRICS, 2018, 117 (01) : 511 - 526
  • [3] Relational Machine Learning Author Disambiguation
    Bastrakova, Ekaterina
    Ledesma, Rodney
    Milian, Jose
    Rico, Fabien
    Zighed, Djamel
    [J]. PROCEEDINGS OF THE 2016 IEEE ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE CONFERENCE (AINL FRUCT 2016), 2016, : 14 - 20
  • [4] Ethnicity-based name partitioning for author name disambiguation using supervised machine learning
    Kim, Jinseok
    Kim, Jenna
    Owen-Smith, Jason
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2021, 72 (08) : 979 - 994
  • [5] A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language model
    Huang, Zhenyuan
    Zhang, Hui
    Hao, Chengqian
    Yang, Haijun
    Wu, Harris
    [J]. Knowledge-Based Systems, 2024, 305
  • [6] ANDez: An open-source tool for author name disambiguation using machine learning
    Kim, Jinseok
    Kim, Jenna
    [J]. SOFTWAREX, 2024, 26
  • [7] Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation
    Kim, Jinseok
    Kim, Jenna
    Kim, Jinmo
    [J]. JOURNAL OF INFORMATION SCIENCE, 2023, 49 (03) : 711 - 725
  • [8] Learning semantic and relationship joint embedding for author name disambiguation
    Xiong, Bo
    Bao, Peng
    Wu, Yilin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (06): : 1987 - 1998
  • [9] Learning semantic and relationship joint embedding for author name disambiguation
    Xiong, Bo
    Bao, Peng
    Wu, Yilin
    [J]. Neural Computing and Applications, 2021, 33 (06) : 1987 - 1998
  • [10] Learning semantic and relationship joint embedding for author name disambiguation
    Bo Xiong
    Peng Bao
    Yilin Wu
    [J]. Neural Computing and Applications, 2021, 33 : 1987 - 1998