Using Full-text of Academic Articles to Find Software Clusters

被引:0
|
作者
Zhang, Heng [1 ]
Ma, Shutian [1 ]
Zhang, Chengzhi [1 ]
机构
[1] Nanjing Univ Sci & Technol, Dept Informat Management, Nanjing 210094, Peoples R China
关键词
Scientific Software; Software Clustering; Distributed Representation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Scientific software is making contributions to modern science. To meet huge academic demands such as data analysis, modelling, visualization and so on, various software has been developed to help different steps in scientific work. In order to reveal the connections between scientific software, we conduct cluster analysis among scientific software based on the full-text data of 23,120 articles published in PLOS ONE. Firstly, we select some popular software whose mention times are over 50 to be our candidate software list for clustering analysis. Secondly, Word2Vec is applied to learn distributed representation for each software. Then, we apply Affinity Propagation to cluster software and tune different parameters to obtain better results. Silhouette coefficient is computed here to evaluate clustering performance under each parameter setting. According to our optimal results, software clusters with specific functions can be found. And software which have strong linkage between each other are mainly have functions in common.
引用
收藏
页码:2776 / 2777
页数:2
相关论文
共 50 条
  • [41] FULL-TEXT DATABASES
    TENOPIR, C
    [J]. ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1984, 19 : 215 - 246
  • [42] Humanities full-text
    Williams, H
    [J]. LIBRARY JOURNAL, 2003, 128 (05) : 124 - 124
  • [43] Full-Text Search Engine using MySQL
    Gyorodi, C.
    Gyorodi, R.
    Pecherle, G.
    Cornea, G. M.
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2010, 5 (05) : 735 - 743
  • [44] PMC text mining subset in BioC: about three million full-text articles and growing
    Comeau, Donald C.
    Wei, Chih-Hsuan
    Dogan, Rezarta Islamaj
    Lu, Zhiyong
    [J]. BIOINFORMATICS, 2019, 35 (18) : 3533 - 3535
  • [45] Efficient Extraction of Protein-Protein Interactions from Full-Text Articles
    Hakenberg, Joerg
    Leaman, Robert
    Vo, Nguyen Ha
    Jonnalagadda, Siddhartha
    Sullivan, Ryan
    Miller, Christopher
    Tari, Luis
    Baral, Chitta
    Gonzalez, Graciela
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) : 481 - 494
  • [46] GetItFull - A tool for downloading and pre-processing full-text journal articles
    Natarajan, Jeyakumar
    Haines, Cliff
    Berglund, Brian
    DeSesa, Catherine
    Hack, Catherine J.
    Dubitzky, Werner
    Bremer, Eric G.
    [J]. KNOWLEDGE DISCOVERY IN LIFE SCIENCE LITERATURE, PROCEEDINGS, 2006, 3886 : 139 - 145
  • [47] Using distant supervised learning to identify protein subcellular localizations from full-text scientific articles
    Zheng, Wu
    Blake, Catherine
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 57 : 134 - 144
  • [48] Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers
    Pan, Xuelian
    Yan, Erjia
    Wang, Qianqian
    Hua, Weina
    [J]. JOURNAL OF INFORMETRICS, 2015, 9 (04) : 860 - 871
  • [49] A software program for working with scientific literature and performing full-text searching
    Fitzgerald, DA
    [J]. AMERICAN LABORATORY, 2003, 35 (25) : 28 - 29
  • [50] SEARCHING FULL-TEXT DATABASES
    TENOPIR, C
    [J]. LIBRARY JOURNAL, 1988, 113 (08) : 60 - 61