Efficient supervised and semi-supervised approaches for affiliations disambiguation

被引:0
|
作者
Pascal Cuxac
Jean-Charles Lamirel
Valerie Bonvallot
机构
[1] INIST-CNRS,
[2] LORIA-Synalp,undefined
来源
Scientometrics | 2013年 / 97卷
关键词
Affiliation; Disambiguation; Data cleaning; Classification; Clustering; Semi-supervised; Bibliographic databases; K-means; Naive bayes;
D O I
暂无
中图分类号
学科分类号
摘要
The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web…etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions… Therefore, the search of names of persons or of organizations is difficult as soon as a single name might appear in many different forms. This paper proposes two approaches to disambiguate on the affiliations of authors of scientific papers in bibliographic databases: the first way considers that a training dataset is available, and uses a Naive Bayes model. The second way assumes that there is no learning resource, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and the approach is already partially applied in a scientific survey department. However, our experiments also highlight that our approach has some limitations: it cannot process efficiently highly unbalanced data. Alternatives solutions are possible for future developments, particularly with the use of a recent clustering algorithm relying on feature maximization.
引用
收藏
页码:47 / 58
页数:11
相关论文
共 50 条
  • [1] Efficient supervised and semi-supervised approaches for affiliations disambiguation
    Cuxac, Pascal
    Lamirel, Jean-Charles
    Bonvallot, Valerie
    [J]. SCIENTOMETRICS, 2013, 97 (01) : 47 - 58
  • [2] Semi-Supervised Multiple Disambiguation
    Ghoorchian, Kambiz
    Rahimian, Fatemeh
    Girdzijauskas, Sarunas
    [J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 88 - 95
  • [3] Word sense disambiguation by semi-supervised learning
    Niu, ZY
    Ji, DH
    Tan, CL
    Yang, LP
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 238 - 241
  • [4] Word sense disambiguation: an evaluation study of semi-supervised approaches with word embeddings
    Sousa, Samuel
    Milios, Evangelos
    Berton, Lilian
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [5] Retraining: The Semi-Supervised Learning of the Word Sense Disambiguation
    Suarez, Armando
    Palomar, Manuel
    Rigau, German
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (34):
  • [6] Semi-supervised approach for Persian word sense disambiguation
    Mahmoodvand, Mohamadreza
    Hourali, Maryam
    [J]. PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2017, : 104 - 110
  • [7] Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning
    Shi, Yi
    Wang, Congyi
    Chen, Yu
    Bin Wang
    [J]. INTERSPEECH 2021, 2021, : 4109 - 4113
  • [8] Semi-Supervised Method for Chinese Word Sense Disambiguation
    Zhang C.
    Xu Z.
    Gao X.
    [J]. Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2019, 54 (02): : 408 - 414
  • [9] Name Disambiguation Using Semi-supervised Topic Model
    Fu, JinLan
    Qiu, Jie
    Wang, Jing
    Li, Li
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 471 - 480
  • [10] Semi-supervised approaches to efficient evaluation of model prediction performance
    Gronsbell, Jessica L.
    Cai, Tianxi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2018, 80 (03) : 579 - 594