Model Reuse in Machine Learning for Author Name Disambiguation: An Exploration of Transfer Learning

被引：1

作者：

Kim, Jinseok ^{[1
,2
]}

Owen-Smith, Jason ^{[1
,2
,3
]}

机构：

[1] Univ Michigan, Inst Res Innovat & Sci, Ann Arbor, MI 48104 USA

[2] Univ Michigan, Inst Social Res, Survey Res Ctr, Ann Arbor, MI 48104 USA

[3] Univ Michigan, Dept Surg, Ann Arbor, MI 48104 USA

来源：

IEEE ACCESS | 2020年 / 8卷 / 08期

基金：

美国国家科学基金会;

关键词：

Task analysis; Data models; Machine learning; Training; Training data; Libraries; Labeling; Data handling; data preprocessing; author name disambiguation; machine learning; transfer learning; authority control; IMPACT;

D O I：

10.1109/ACCESS.2020.3031112

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Machine learning for author name disambiguation is usually conducted on the training and test subsets of labeled data created for a specific task. As a result, disambiguation models learned on heterogeneous labeled data are often inapplicable for other purposes that either do not use the same labeled data or do not make use of any labeled data at all. This article explores the idea of transfer learning in a new context, author name disambiguation. We focus on cases where a disambiguation task lacking labeled training data uses models trained on labeled data generated for other tasks. For this purpose, two labeled source datasets are used for training of disambiguation models to be applied to three test target datasets that are deficient of labeled training data. Our results show that transfer learning can produce disambiguation performances similar to those achievable by traditional machine learning in which training and test datasets come from the same labeled data source. The good performance through transfer learning are possible when training source datasets have similar feature distributions as test target datasets. This study suggests that through transfer learning, rich disambiguation models in previous studies can be retained and reused across ambiguous bibliographic data from different fields and data sources, motivating further research on how to correct feature distribution differences between source and target datasets to expand the application of transfer learning in author name disambiguation beyond the model sharing explored in this research.

引用

页码：188378 / 188389

页数：12

共 50 条

[41] Politics of data reuse in machine learning systems: Theorizing reuse entanglements
Thylstrup, Nanna Bonde
Hansen, Kristian Bondo
Flyverbom, Mikkel
Amoore, Louise
[J]. BIG DATA & SOCIETY, 2022, 9 (02)
[42] Name Disambiguation-Learning From More User-Friendly Models
Thomas, Bob
[J]. CATALOGING & CLASSIFICATION QUARTERLY, 2011, 49 (03) : 223 - 232
[43] Author name disambiguation using a graph model with node splitting and merging based on bibliographic information
Shin, Dongwook
Kim, Taehwan
Choi, Joongmin
Kim, Jungsun
[J]. SCIENTOMETRICS, 2014, 100 (01) : 15 - 50
[44] Author name disambiguation using a graph model with node splitting and merging based on bibliographic information
Dongwook Shin
Taehwan Kim
Joongmin Choi
Jungsun Kim
[J]. Scientometrics, 2014, 100 : 15 - 50
[45] Safe Exploration for Interactive Machine Learning
Turchetta, Matteo
Berkenkamp, Felix
Krause, Andreas
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[46] Machine Learning for Gas and Oil Exploration
Nordloh, Vito Alexander
Roubickova, Anna
Brown, Nick
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 3009 - 3016
[47] THE EXPLORATION OF A MACHINE LEARNING APPROACH FOR THE ASSESSMENT OF LEARNING STYLES CHANGES
Wei, Yueer
Yang, Qingxia
Chen, Jiangping
Hu, Jie
[J]. MECHATRONIC SYSTEMS AND CONTROL, 2018, 46 (03): : 121 - 126
[48] AN EXPLORATION OF FEDERATED LEARNING FOR PRIVACY-PRESERVING MACHINE LEARNING
Kumar, K. Kiran
Rao, Thalakola Syamsundara
Vullam, Nagagopiraju
Vellela, Sai Srinivas
Jyosthna, B.
Farjana, Shaik
Javvadi, Sravanthi
[J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
[49] A Predictive Model of Learning Effectiveness in Flipped Classroom Mode: An Exploration of Higher Vocational English Learning Based on Machine Learning
Wang, Lizhen
[J]. Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
[50] Disambiguation of author entities in ADS using supervised learning and graph theory methods
Mihaljevic, Helena
Santamaria, Lucia
[J]. SCIENTOMETRICS, 2021, 126 (05) : 3893 - 3917

← 1 2 3 4 5 →