Identification of transcription factor contexts in literature using machine learning approaches

被引:6
|
作者
Yang, Hui [1 ]
Nenadic, Goran [1 ]
Keane, John A. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1186/1471-2105-9-S3-S11
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Availability of information about transcription factors (TFs) is crucial for genome biology, as TFs play a central role in the regulation of gene expression. While manual literature curation is expensive and labour intensive, the development of semi-automated text mining support is hindered by unavailability of training data. There have been no studies on how existing data sources (e.g. TF-related data from the MeSH thesaurus and GO ontology) or potentially noisy example data (e.g. protein-protein interaction, PPI) could be used to provide training data for identification of TF-contexts in literature. Results: In this paper we describe a text-classification system designed to automatically recognise contexts related to transcription factors in literature. A learning model is based on a set of biological features (e.g. protein and gene names, interaction words, other biological terms) that are deemed relevant for the task. We have exploited background knowledge from existing biological resources (MeSH and GO) to engineer such features. Weak and noisy training datasets have been collected from descriptions of TF-related concepts in MeSH and GO, PPI data and data representing non-protein-function descriptions. Three machine-learning methods are investigated, along with a vote-based merging of individual approaches and/or different training datasets. The system achieved highly encouraging results, with most classifiers achieving an F-measure above 90%. Conclusions: The experimental results have shown that the proposed model can be used for identification of TF-related contexts (i.e. sentences) with high accuracy, with a significantly reduced set of features when compared to traditional bag-of-words approach. The results of considering existing PPI data suggest that there is not as high similarity between TF and PPI contexts as we have expected. We have also shown that existing knowledge sources are useful both for feature engineering and for obtaining noisy positive training data.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Landslide identification using machine learning
    Haojie Wang
    Limin Zhang
    Kesheng Yin
    Hongyu Luo
    Jinhui Li
    Geoscience Frontiers, 2021, 12 (01) : 351 - 364
  • [42] Identification of chimera using machine learning
    Ganaie, M. A.
    Ghosh, Saptarshi
    Mendola, Naveen
    Tanveer, M.
    Jalan, Sarika
    CHAOS, 2020, 30 (06)
  • [43] Landslide identification using machine learning
    Wang, Haojie
    Zhang, Limin
    Yin, Kesheng
    Luo, Hongyu
    Li, Jinhui
    GEOSCIENCE FRONTIERS, 2021, 12 (01) : 351 - 364
  • [44] Machine Learning Algorithm for Predicting Ethylene Responsive Transcription Factor in Rice Using an Ensemble Classifier
    Hemalatha, N.
    Brendon, V. F.
    Shihab, M. M.
    Rajesh, M. K.
    PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL(ICAC3'15), 2015, 49 : 128 - 135
  • [45] Interpretable machine learning approaches for damage identification in drilling risers
    Ge, Zheng-guang
    Zhou, Xingkun
    Li, Yan
    Zhang, Xiantao
    Li, Wenhua
    OCEAN ENGINEERING, 2024, 309
  • [46] Exploring the Landscape of Programming Language Identification With Machine Learning Approaches
    Verma, Amandeep
    Saha, Rahul
    Kumar, Gulshan
    Brighente, Alessandro
    Conti, Mauro
    Kim, Tai-Hoon
    IEEE ACCESS, 2025, 13 : 23556 - 23579
  • [47] A survey on various machine learning approaches for human electrocardiograms identification
    Hameed, Nada Mahmood
    Al-Tuwaijari, Jamal Mustafa
    INTERNATIONAL JOURNAL OF NONLINEAR ANALYSIS AND APPLICATIONS, 2022, 13 (01): : 4017 - 4035
  • [48] Wake mode identification of rotating triangle with machine learning approaches
    Du, Peng
    Wei, Hongzhuang
    Du, Xiangbo
    Hu, Haibao
    PHYSICS OF FLUIDS, 2024, 36 (05)
  • [49] Features in Identification Approaches for MicroRNA Precursors Based on Machine Learning
    Zheng Hongjun
    Pu Haiqing
    Wang Xiuqin
    Li Yongqiang
    2014 FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND ENGINEERING APPLICATIONS (ISDEA), 2014, : 483 - 488
  • [50] IDENTIFICATION OF NEW BIOMARKERS FOR RHEUMATOID ARTHRITIS BY MACHINE LEARNING APPROACHES
    Huang, Hai-Hui
    Peng, Xin-Dong
    Liang, Yong
    He, Min-Fan
    JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2022, 23 (10) : 2469 - 2477