Efficient supervised and semi-supervised approaches for affiliations disambiguation

被引:0
|
作者
Pascal Cuxac
Jean-Charles Lamirel
Valerie Bonvallot
机构
[1] INIST-CNRS,
[2] LORIA-Synalp,undefined
来源
Scientometrics | 2013年 / 97卷
关键词
Affiliation; Disambiguation; Data cleaning; Classification; Clustering; Semi-supervised; Bibliographic databases; K-means; Naive bayes;
D O I
暂无
中图分类号
学科分类号
摘要
The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web…etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions… Therefore, the search of names of persons or of organizations is difficult as soon as a single name might appear in many different forms. This paper proposes two approaches to disambiguate on the affiliations of authors of scientific papers in bibliographic databases: the first way considers that a training dataset is available, and uses a Naive Bayes model. The second way assumes that there is no learning resource, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and the approach is already partially applied in a scientific survey department. However, our experiments also highlight that our approach has some limitations: it cannot process efficiently highly unbalanced data. Alternatives solutions are possible for future developments, particularly with the use of a recent clustering algorithm relying on feature maximization.
引用
收藏
页码:47 / 58
页数:11
相关论文
共 50 条
  • [21] Approaches to semi-supervised learning of fuzzy classifiers
    Klose, A
    [J]. KI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2821 : 436 - 449
  • [22] Greedy approaches to semi-supervised subspace learning
    Kim, Minyoung
    [J]. PATTERN RECOGNITION, 2015, 48 (04) : 1563 - 1570
  • [23] An efficient spatial semi-supervised learning algorithm
    Vatsavai, Ranga Raju
    Shekhar, Shashi
    Burk, Thomas E.
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2007, 22 (06) : 427 - 437
  • [24] LABEL REUSE FOR EFFICIENT SEMI-SUPERVISED LEARNING
    Hsieh, Tsung-Hung
    Chen, Jun-Cheng
    Chen, Chu-Song
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3697 - 3701
  • [25] Word Sense Disambiguation Based on Semi-Supervised Convolutional Neural Networks
    Zhang C.
    Tang L.
    Gao X.
    [J]. Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2022, 57 (01): : 11 - 17and27
  • [26] Semi-supervised learning integrated with classifier combination for word sense disambiguation
    Le, Anh-Cuong
    Shimazu, Akira
    Huynh, Van-Nam
    Nguyen, Le-Minh
    [J]. COMPUTER SPEECH AND LANGUAGE, 2008, 22 (04): : 330 - 345
  • [27] An efficient semi-supervised graph based clustering
    Viet-Vu Vu
    [J]. INTELLIGENT DATA ANALYSIS, 2018, 22 (02) : 297 - 307
  • [28] An Efficient Semi-Supervised SVM for Anomaly Detection
    Kim, Junae
    Montague, Paul
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2843 - 2850
  • [29] A Chinese expert disambiguation method based on semi-supervised graph clustering
    Jiang, Jin
    Yan, Xin
    Yu, Zhengtao
    Guo, Jianyi
    Tian, Wei
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (02) : 197 - 204
  • [30] A Chinese expert disambiguation method based on semi-supervised graph clustering
    Jin Jiang
    Xin Yan
    Zhengtao Yu
    Jianyi Guo
    Wei Tian
    [J]. International Journal of Machine Learning and Cybernetics, 2015, 6 : 197 - 204