Impact of Context on Keyword Identification and Use in Biomedical Literature Mining

被引:0
|
作者
Dasigi, Venu G. [1 ]
Karam, Orlando [2 ]
Pydimarri, Sailaja [3 ]
机构
[1] Bowling Green State Univ, Bowling Green, OH 43403 USA
[2] Kennesaw State Univ, Marietta, GA USA
[3] Life Univ, Marietta, GA USA
关键词
Literature mining; Automatic keyword identification; TF-IDF; Z-score; Background set; Features; Clustering; TEXT; EXTRACTION;
D O I
10.1007/978-3-030-02686-8_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of two statistical metrics in automatically identifying important keywords associated with a concept such as a gene by mining scientific literature is reviewed. Starting with a subset of MEDLINE (R) abstracts that contain the name or synonyms of a gene in their titles, the aforementioned metrics contrast the prevalence of specific words in these documents against a broader "background set" of abstracts. If a word occurs substantially more often in the document subset associated with a gene than in the background set that acts as a reference, then the word is viewed as capturing some specific attribute of the gene. The keywords thus automatically identified may be used as gene features in clustering algorithms. Since the background set is the reference against which keyword prevalence is contrasted, the authors hypothesize that different background document sets can lead to somewhat different sets of keywords to be identified as specific to a gene. Two different background sets are discussed that are useful for two somewhat different purposes, namely, characterizing the function of a gene, and clustering a set of genes based on their shared functional similarities. Experimental results that reveal the significance of the choice of background set are presented.
引用
收藏
页码:505 / 516
页数:12
相关论文
共 50 条
  • [1] Use of figures in literature mining for biomedical digital libraries
    Chen, Nawei
    Shatkay, Hagit
    Blostein, Dorothea
    [J]. SECOND INTERNATIONAL CONFERENCE ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES, PROCEEDINGS, 2006, : 180 - +
  • [2] Biomedical literature mining
    Hu, Xiaohua
    [J]. PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 1446 - 1446
  • [3] Identification of Pathway-Modulating Genes Using the Biomedical Literature Mining
    Yu, Zhenning
    Nam, Jin Hyun
    Couch, Daniel
    Lawson, Andrew
    Chung, Dongjun
    [J]. NEW FRONTIERS OF BIOSTATISTICS AND BIOINFORMATICS, 2018, : 345 - 363
  • [4] Biomedical Text Mining for Concept Identification from Traditional Medicine Literature
    Javed, Zeeshan
    Afzal, Hammad
    [J]. 2014 INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS AND TECHNOLOGIES (ICOSST), 2014, : 206 - 211
  • [5] DTMiner: identification of potential disease targets through biomedical literature mining
    Xu, Dong
    Zhang, Meizhuo
    Xie, Yanping
    Wang, Fan
    Chen, Ming
    Zhu, Kenny Q.
    Wei, Jia
    [J]. BIOINFORMATICS, 2016, 32 (23) : 3619 - 3626
  • [6] Text mining the biomedical literature
    Pertsemlidis, A
    [J]. BIOPHYSICAL JOURNAL, 2002, 82 (01) : 168A - 168A
  • [7] KEYWORD-IN-CONTEXT INDEX FOR TECHNICAL LITERATURE
    LUHN, HP
    [J]. AMERICAN DOCUMENTATION, 1960, 11 (04): : 288 - 295
  • [8] Mining and modeling linkage information from citation context for improving biomedical literature retrieval
    Yin, Xiaoshi
    Huang, Jimmy Xiangji
    Li, Zhoujun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (01) : 53 - 67
  • [9] Mining the literature: new methods to exploit keyword profiles
    Andrade-Navarro, Miguel A.
    [J]. GENOME MEDICINE, 2012, 4
  • [10] Mining the literature: new methods to exploit keyword profiles
    Miguel A Andrade-Navarro
    [J]. Genome Medicine, 4