Experiments in text-based mining and analysis of biological information from MEDLINE on functionally-related genes

被引:1
|
作者
Moon, N [1 ]
Singh, R [1 ]
机构
[1] San Francisco State Univ, Dept Comp Sci, San Francisco, CA 94132 USA
关键词
EXPRESSION;
D O I
10.1109/ICSENG.2005.41
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Technological advancements such as microarrays have enabled biologists to generate unprecedented quantities of data about biological entities. This has lead to the development of a large number of algorithms for processing and analysis of biological data. Challenges however remain; for instance, genes that function cooperatively need not have similar expression patterns. This suggests the use of non-numerical sources of information to explore the underlying biology. We experimentally study various factors that are inherent in algorithmic methodologies for text analysis. The proposed method accesses MEDLINE dynamically to account for the latest research, with the available literature corresponding to the genes analyzed to develop lists of keywords. Natural language processing (NLP) techniques such as stop-word filtering and stemming are then applied to the lists, and keyword frequencies weighted using the term frequency-inverse document frequency (TFIDF) scheme. The results are input to a hierarchical clustering algorithm to derive groupings of genes by functionality. The process is repealed using z-score weighting and latent semantic analysis (LSA) to determine which yields the most accurate clustering. The study presented examines the importance of these steps and their influence on the overall efficacy of the system. We believe that the analysis conducted as part of this research will be invaluable to development and fine-tuning of text mining methodologies for biological literature.
引用
收藏
页码:326 / 331
页数:6
相关论文
共 50 条
  • [21] Capturing Task-Related Information for Text-Based Grasp Classification Using Fine-Tuned Embeddings
    Kleer, Niko
    Weyand, Leon
    Feld, Michael
    Berberich, Klaus
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 288 - 299
  • [22] Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria
    Saxena, Priya
    Rauniyar, Shailabh
    Thakur, Payal
    Singh, Ram Nageena
    Bomgni, Alain
    Alaba, Mathew O.
    Tripathi, Abhilash Kumar
    Gnimpieba, Etienne Z.
    Lushbough, Carol
    Sani, Rajesh Kumar
    FRONTIERS IN MICROBIOLOGY, 2023, 14
  • [23] SNAPping up functionally related genes based on context information: A colinearity-free approach
    Kolesov, G
    Mewes, HW
    Frishman, D
    JOURNAL OF MOLECULAR BIOLOGY, 2001, 311 (04) : 639 - 656
  • [24] SNAPping up functionally related genes based on context information: a colinearity-free approach
    Kolesov, G
    Mewes, HW
    Frishman, D
    BIOINFORMATICS AND GENOME ANALYSIS, 2002, 38 : 29 - 63
  • [25] An analysis of BIM-related job requirements based on text mining in China
    Liu, Jiahao
    Xu, Xi
    Liu, Jing
    JOURNAL OF ENGINEERING DESIGN AND TECHNOLOGY, 2025, 23 (01) : 126 - 142
  • [26] Enabling the creation of domain-specific reference collections to support text-based information retrieval experiments in the architecture, engineering and construction industries
    Lin, K. Y.
    Hsieh, S. H.
    Tserng, H. P.
    Chou, K. W.
    Lin, H. T.
    Huang, C. P.
    Tzeng, K. F.
    ADVANCED ENGINEERING INFORMATICS, 2008, 22 (03) : 350 - 361
  • [27] Quantity Analysis Method for Text-Based Chip Test Datasets from Automated Test Equipment
    Fu, Jie
    Sun, Kai
    Jia, Hanbo
    Fu, Da
    Xu, Jingyuan
    Guo, Xuan
    JOURNAL OF ELECTRONIC TESTING-THEORY AND APPLICATIONS, 2025,
  • [28] Advancing digital health in information systems research: Insights from a text mining analysis
    Weissenfels, Silke
    Nissen, Anika
    Smolnik, Stefan
    ELECTRONIC MARKETS, 2025, 35 (01)
  • [29] Mining for genes related to choroidal neovascularization based on the shortest path algorithm and protein interaction information
    Zhang, Jian
    Suo, Yan
    Zhang, Yu-Hang
    Zhang, Qing
    Chen, XiJia
    Xu, Xun
    Lu, WenCong
    BIOCHIMICA ET BIOPHYSICA ACTA-GENERAL SUBJECTS, 2016, 1860 (11): : 2740 - 2749
  • [30] Development of image-based decision support systems utilizing information extracted from radiological free-text report databases with text-based transformers
    Nowak, Sebastian
    Schneider, Helen
    Layer, Yannik C.
    Theis, Maike
    Biesner, David
    Block, Wolfgang
    Wulff, Benjamin
    Attenberger, Ulrike I.
    Sifa, Rafet
    Sprinkart, Alois M.
    EUROPEAN RADIOLOGY, 2024, 34 (05) : 2895 - 2904