Clustering scientific literature using sparse citation graph analysis

被引:0
|
作者
Bolelli, Levent [1 ]
Ertekin, Seyda
Giles, C. Lee
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[2] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well known that connectivity analysis of linked documents provides significant information about the structure of the document space for unsupervised learning tasks. However, the ability to identify distinct clusters of documents based on link graph analysis is proportional to the density of the graph and depends on the availability of the linking and/or linked documents in the collection. In this paper, we present an information theoretic approach towards measuring the significance of individual words based on the underlying link structure of the document collection. This enables us to generate a non-uniform weight distribution of the feature space which is used to augment the original corpus-based document similarities. The experimental results on the collection of scientific literature show that our method achieves better separation of distinct groups of documents, yielding improved clustering solutions.
引用
收藏
页码:30 / 41
页数:12
相关论文
共 50 条
  • [1] AGING OF SCIENTIFIC LITERATURE - CITATION ANALYSIS
    GRIFFITH, BC
    SERVI, PN
    ANKER, AL
    DROTT, MC
    [J]. JOURNAL OF DOCUMENTATION, 1979, 35 (03) : 179 - 196
  • [2] A CLUSTERING METHOD OF SCIENTIFIC LITERATURE BASED ON AVERAGED CITATION MULTIPLICITY
    MIYAMOTO, S
    NAKAYAMA, K
    [J]. LIBRARY AND INFORMATION SCIENCE, 1979, (17): : 93 - 102
  • [3] Document clustering of scientific texts using citation contexts
    Aljaber, Bader
    Stokes, Nicola
    Bailey, James
    Pei, Jian
    [J]. INFORMATION RETRIEVAL, 2010, 13 (02): : 101 - 131
  • [4] Document clustering of scientific texts using citation contexts
    Bader Aljaber
    Nicola Stokes
    James Bailey
    Jian Pei
    [J]. Information Retrieval, 2010, 13 : 101 - 131
  • [5] Scientific publications clustering using textual and citation information
    Chikhi, Nacim Fateh
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [6] Citation genetic genealogy: a novel insight for citation analysis in scientific literature
    Sun, Fengjun
    Zhu, Lijun
    [J]. SCIENTOMETRICS, 2012, 91 (02) : 577 - 589
  • [7] Citation genetic genealogy: a novel insight for citation analysis in scientific literature
    Fengjun Sun
    Lijun Zhu
    [J]. Scientometrics, 2012, 91 : 577 - 589
  • [8] Citation Genetic Genealogy: A New Perspective for Citation Analysis in Scientific Literature
    Sun, Fengjun
    [J]. PROCEEDINGS OF ISSI 2011: THE 13TH CONFERENCE OF THE INTERNATIONAL SOCIETY FOR SCIENTOMETRICS AND INFORMETRICS, VOLS 1 AND 2, 2011, : 817 - 828
  • [10] CITATION OF SCIENTIFIC LITERATURE ON GEOLOGY
    ARUTYUNOV, VV
    MEDVEDEVA, IE
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1992, (09): : 24 - 30