Scientific publications clustering using textual and citation information

被引:2
|
作者
Chikhi, Nacim Fateh [1 ]
机构
[1] Univ Blida 1, Fac Sci, Dept Comp Sci, BP 270 Route Soumaa, Blida 09000, Algeria
关键词
Document clustering; Text mining; Science mapping; RELATEDNESS MEASURES;
D O I
10.1016/j.eswa.2024.123319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scientific publications clustering has attracted much attention, and many different approaches have been proposed. One of the challenges in scientific documents clustering is how to combine citation and textual information to improve clustering quality. In this paper, we explore the use of the von Mises-Fisher distribution for scientific documents clustering. The von Mises-Fisher distribution is particularly well-suited for the analysis of directional data. More precisely, we propose a multi-view version of the mixture of von Mises-Fisher distributions in which one view corresponds to textual information and the other view corresponds to citation information. The hypothesis underlying our approach is that both text and citation data are directional. To estimate the parameters of the proposed model, we use the Expectation-Maximization algorithm along with deterministic annealing to escape poor local maxima solutions. Experiments on two real world datasets show that our algorithm outperforms baseline algorithms in terms of clustering accuracy.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Towards Using Scientific Publications to Automatically Extract Information on Rare Diseases
    Charles Cousyn
    Kévin Bouchard
    Sébastien Gaboury
    Bruno Bouchard
    Mobile Networks and Applications, 2020, 25 : 953 - 960
  • [42] Clustering Scientific Document Based on an Extended Citation Model
    Zhang, Shuai
    Xu, Yangbing
    Zhang, Wenyu
    IEEE ACCESS, 2019, 7 : 57037 - 57046
  • [43] SMERC: Social media event response clustering using textual and temporal information
    Mathews, Peter
    Gray, Caitlin
    Mitchell, Lewis
    Nguyen, Giang T.
    Bean, Nigel G.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3695 - 3700
  • [44] PaperPoles: Facilitating Adaptive Visual Exploration of Scientific Publications by Citation Links
    He, Jiangen
    Ping, Qing
    Lou, Wen
    Chen, Chaomei
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2019, 70 (08) : 843 - 857
  • [45] AMiner Citation-Data Preprocessing for Recommender Systems on Scientific Publications
    Stergiopoulos, Vaios Th
    Tsianaka, Thalia, V
    Tousidou, Eleni N.
    25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 23 - 27
  • [46] GraphCite: Citation Intent Classification in Scientific Publications via Graph Embeddings
    Berrebbi, Dan
    Huynh, Nicolas
    Balalau, Oana
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 779 - 783
  • [47] On non-objective citation to scientific publications on mechanics and control systems
    Aliev F.A.
    Larin V.B.
    International Applied Mechanics, 2011, 46 (12) : 1400 - 1409
  • [48] Power-law distributions for the citation index of scientific publications and scientists
    Gupta, HM
    Campanha, JR
    Pesce, RAG
    BRAZILIAN JOURNAL OF PHYSICS, 2005, 35 (4A) : 981 - 986
  • [49] Bibliometrics of Sudanese scientific publications: Subjects, institutions, collaboration, citation and recommendations
    Elliassan, Moawia Mohammed Ali
    Monge-Najera, Julian
    Ho, Yuh-Shan
    REVISTA DE BIOLOGIA TROPICAL, 2022, 70 : 30 - 39
  • [50] Citation in scientific publications. Audit of the Journal "Strahlentherapie und Onkologie"
    Raabe, A
    Dubben, HH
    STRAHLENTHERAPIE UND ONKOLOGIE, 2001, 177 (11) : 585 - 591