Identification of transcription factor contexts in literature using machine learning approaches

被引:6
|
作者
Yang, Hui [1 ]
Nenadic, Goran [1 ]
Keane, John A. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1186/1471-2105-9-S3-S11
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Availability of information about transcription factors (TFs) is crucial for genome biology, as TFs play a central role in the regulation of gene expression. While manual literature curation is expensive and labour intensive, the development of semi-automated text mining support is hindered by unavailability of training data. There have been no studies on how existing data sources (e.g. TF-related data from the MeSH thesaurus and GO ontology) or potentially noisy example data (e.g. protein-protein interaction, PPI) could be used to provide training data for identification of TF-contexts in literature. Results: In this paper we describe a text-classification system designed to automatically recognise contexts related to transcription factors in literature. A learning model is based on a set of biological features (e.g. protein and gene names, interaction words, other biological terms) that are deemed relevant for the task. We have exploited background knowledge from existing biological resources (MeSH and GO) to engineer such features. Weak and noisy training datasets have been collected from descriptions of TF-related concepts in MeSH and GO, PPI data and data representing non-protein-function descriptions. Three machine-learning methods are investigated, along with a vote-based merging of individual approaches and/or different training datasets. The system achieved highly encouraging results, with most classifiers achieving an F-measure above 90%. Conclusions: The experimental results have shown that the proposed model can be used for identification of TF-related contexts (i.e. sentences) with high accuracy, with a significantly reduced set of features when compared to traditional bag-of-words approach. The results of considering existing PPI data suggest that there is not as high similarity between TF and PPI contexts as we have expected. We have also shown that existing knowledge sources are useful both for feature engineering and for obtaining noisy positive training data.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Identification of transcription factor contexts in literature using machine learning approaches
    Hui Yang
    Goran Nenadic
    John A Keane
    BMC Bioinformatics, 9
  • [2] Writer identification using machine learning approaches: a comprehensive review
    Rehman, Arshia
    Naz, Saeeda
    Razzak, Muhammad Imran
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (08) : 10889 - 10931
  • [3] Writer identification using machine learning approaches: a comprehensive review
    Arshia Rehman
    Saeeda Naz
    Muhammad Imran Razzak
    Multimedia Tools and Applications, 2019, 78 : 10889 - 10931
  • [4] Semantic role identification for Malayalam using machine learning approaches
    Jayan, Jisha P. P.
    Kumar, J. Satheesh
    Amudha, T.
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2025, 21 (01) : 279 - 285
  • [5] Credit Card Fraud Identification Using Machine Learning Approaches
    Kumar, Pawan
    Iqbal, Fahad
    PROCEEDINGS OF 2019 1ST INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION AND COMMUNICATION TECHNOLOGY (ICIICT 2019), 2019,
  • [6] The Identification of Negative Content in Websites by Using Machine Learning Approaches
    Amalia, Amalia
    Gunawan, Dani
    Lydia, Maya Silvi
    Wesley
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, ENGINEERING, AND DESIGN (ICCED), 2019,
  • [7] Identifying transcription factor-DNA interactions using machine learning
    Bang, Sohyun
    Galli, Mary
    Crisp, Peter A.
    Gallavotti, Andrea
    Schmitz, Robert J.
    IN SILICO PLANTS, 2022, 4 (02):
  • [8] Prediction and identification of nonlinear dynamical systems using machine learning approaches
    Jin, Leisheng
    Liu, Zhuo
    Li, Lijie
    JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION, 2023, 35
  • [9] Identification of Potential Biomarkers in Stomach Adenocarcinoma using Machine Learning Approaches
    Nazari, Elham
    Pourali, Ghazaleh
    Khazaei, Majid
    Asadnia, Alireza
    Dashtiahangar, Mohammad
    Mohit, Reza
    Maftooh, Mina
    Nassiri, Mohammadreza
    Hassanian, Seyed Mahdi
    Ghayour-Mobarhan, Majid
    Ferns, Gordon A. A.
    Shahidsales, Soodabeh
    Avan, Amir
    CURRENT BIOINFORMATICS, 2023, 18 (04) : 320 - 333
  • [10] Cephalopod species identification using integrated analysis of machine learning and deep learning approaches
    Tan, Hui Yuan
    Goh, Zhi Yun
    Loh, Kar-Hoe
    Then, Amy Yee-Hui
    Omar, Hasmahzaiti
    Chang, Siow-Wee
    PEERJ, 2021, 9