Experiments in text-based mining and analysis of biological information from MEDLINE on functionally-related genes

被引:1
|
作者
Moon, N [1 ]
Singh, R [1 ]
机构
[1] San Francisco State Univ, Dept Comp Sci, San Francisco, CA 94132 USA
关键词
EXPRESSION;
D O I
10.1109/ICSENG.2005.41
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Technological advancements such as microarrays have enabled biologists to generate unprecedented quantities of data about biological entities. This has lead to the development of a large number of algorithms for processing and analysis of biological data. Challenges however remain; for instance, genes that function cooperatively need not have similar expression patterns. This suggests the use of non-numerical sources of information to explore the underlying biology. We experimentally study various factors that are inherent in algorithmic methodologies for text analysis. The proposed method accesses MEDLINE dynamically to account for the latest research, with the available literature corresponding to the genes analyzed to develop lists of keywords. Natural language processing (NLP) techniques such as stop-word filtering and stemming are then applied to the lists, and keyword frequencies weighted using the term frequency-inverse document frequency (TFIDF) scheme. The results are input to a hierarchical clustering algorithm to derive groupings of genes by functionality. The process is repealed using z-score weighting and latent semantic analysis (LSA) to determine which yields the most accurate clustering. The study presented examines the importance of these steps and their influence on the overall efficacy of the system. We believe that the analysis conducted as part of this research will be invaluable to development and fine-tuning of text mining methodologies for biological literature.
引用
收藏
页码:326 / 331
页数:6
相关论文
共 50 条
  • [31] A text mining analysis of medication quality related event reports from community pharmacies
    Lester, Corey A.
    Kessler, John M.
    Modisett, Tara
    Chui, Michelle A.
    RESEARCH IN SOCIAL & ADMINISTRATIVE PHARMACY, 2019, 15 (07): : 845 - 851
  • [32] TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records
    Frank Po-Yen Lin
    Adrian Pokorny
    Christina Teng
    Richard J. Epstein
    Scientific Reports, 7
  • [33] TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records
    Lin, Frank Po-Yen
    Pokorny, Adrian
    Teng, Christina
    Epstein, Richard J.
    SCIENTIFIC REPORTS, 2017, 7
  • [34] Identification of Key State Information of Substation Equipment Based on Text Mining and Semantic Analysis Technology
    Wang, Hongwu
    Wu, Zengming
    Yang, Teng
    PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON NEW ENERGY AND ELECTRICAL TECHNOLOGY, 2023, 1017 : 683 - 689
  • [35] User assessments and the use of information from MomConnect, a mobile phone text-based information service, by pregnant women and new mothers in South Africa
    Skinner, Donald
    Delobelle, Peter
    Pappin, Michele
    Pieterse, Desiree
    Esterhuizen, Tonya Marianne
    Barron, Peter
    Dudley, Lilian
    BMJ GLOBAL HEALTH, 2018, 3
  • [36] The Analysis of Natural Disasters in China from 1998 to 2016 based on Text Mining
    Liu, Xiao
    Guo, Haixiang
    Li, Yijing
    Yang, Chunmiao
    PROCEEDINGS OF THE 7TH ANNUAL MEETING OF RISK ANALYSIS COUNCIL OF CHINA ASSOCIATION FOR DISASTER PREVENTION, 2016, 128 : 335 - 342
  • [37] Data mining method from text database based on fuzzy quantification analysis
    Aoki, K
    Watada, J
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 6472 - 6478
  • [38] Systematic analysis of breast atypical hyperplasia-associated hub genes and pathways based on text mining
    Ma, Wei
    Shi, Bei
    Zhao, Fangkun
    Wu, Yunfei
    Jin, Feng
    EUROPEAN JOURNAL OF CANCER PREVENTION, 2019, 28 (06) : 507 - 514
  • [39] An ontology-based pattern mining system for extracting information from biological texts
    Abulaish, M
    Dey, L
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, PT 2, PROCEEDINGS, 2005, 3642 : 420 - 429
  • [40] Mining of light-related genes in grapes based on a time-frequency analysis
    Liu, Longlong
    Zhou, Jie
    Lu, Zichen
    Ma, Meng
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MANAGEMENT INNOVATION, 2015, 28 : 1114 - 1118