Identification of transcription factor contexts in literature using machine learning approaches

被引:6
|
作者
Yang, Hui [1 ]
Nenadic, Goran [1 ]
Keane, John A. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1186/1471-2105-9-S3-S11
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Availability of information about transcription factors (TFs) is crucial for genome biology, as TFs play a central role in the regulation of gene expression. While manual literature curation is expensive and labour intensive, the development of semi-automated text mining support is hindered by unavailability of training data. There have been no studies on how existing data sources (e.g. TF-related data from the MeSH thesaurus and GO ontology) or potentially noisy example data (e.g. protein-protein interaction, PPI) could be used to provide training data for identification of TF-contexts in literature. Results: In this paper we describe a text-classification system designed to automatically recognise contexts related to transcription factors in literature. A learning model is based on a set of biological features (e.g. protein and gene names, interaction words, other biological terms) that are deemed relevant for the task. We have exploited background knowledge from existing biological resources (MeSH and GO) to engineer such features. Weak and noisy training datasets have been collected from descriptions of TF-related concepts in MeSH and GO, PPI data and data representing non-protein-function descriptions. Three machine-learning methods are investigated, along with a vote-based merging of individual approaches and/or different training datasets. The system achieved highly encouraging results, with most classifiers achieving an F-measure above 90%. Conclusions: The experimental results have shown that the proposed model can be used for identification of TF-related contexts (i.e. sentences) with high accuracy, with a significantly reduced set of features when compared to traditional bag-of-words approach. The results of considering existing PPI data suggest that there is not as high similarity between TF and PPI contexts as we have expected. We have also shown that existing knowledge sources are useful both for feature engineering and for obtaining noisy positive training data.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A Comparative Study of Machine Learning Approaches for Handwriter Identification
    Durou, Amal
    Aref, Ibrahim
    Elbendak, Mosa
    Al-Maadeed, Somaya
    Bouridane, Ahmed
    PROCEEDINGS OF 2019 IEEE 12TH INTERNATIONAL CONFERENCE ON GLOBAL SECURITY, SAFETY AND SUSTAINABILITY (ICGS3-2019), 2019, : 207 - 212
  • [22] A Review on Machine Learning Approaches in Identification of Pediatric Epilepsy
    Ahmed M.I.B.
    Alotaibi S.
    Atta-ur-Rahman
    Dash S.
    Nabil M.
    AlTurki A.O.
    SN Computer Science, 3 (6)
  • [23] Machine learning based approaches for sex identification in bioarchaeology
    Miholca, Diana-Lucia
    Czibula, Gabriela
    Mircea, Ioan-Gabriel
    Czibula, Istvan-Gergely
    PROCEEDINGS OF 2016 18TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC), 2016, : 311 - 314
  • [24] Brain lipidomics as a rising field in neurodegenerative contexts: Perspectives with Machine Learning approaches
    Baez Castellanos, Daniel
    Martin-Jimenez, Cynthia A.
    Rojas-Rodriguez, Felipe
    Barreto, George E.
    Gonzalez, Janneth
    FRONTIERS IN NEUROENDOCRINOLOGY, 2021, 61
  • [25] Visitor assistant tools based on Machine learning approaches in Cultural Heritage contexts
    Cuomo, Salvatore
    Chirico, Ugo
    2017 13TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS (SITIS), 2017, : 485 - 489
  • [26] Investigating Machine Learning Approaches for Sentence Compression in Different Application Contexts for Portuguese
    Asevedo Nobrega, Fernando Antonio
    Salgueiro Pardo, Thiago Alexandre
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE (PROPOR 2016), 2016, 9727 : 245 - 250
  • [27] Automatic Identification of Tree Species from UAV Images Using Machine Learning Approaches
    Vaghela Himali Pradipkumar
    R. A. Alagu Raja
    Journal of the Indian Society of Remote Sensing, 2022, 50 : 2447 - 2464
  • [28] Automatic Identification of Tree Species from UAV Images Using Machine Learning Approaches
    Pradipkumar, Vaghela Himali
    Raja, R. A. Alagu
    JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2022, 50 (12) : 2447 - 2464
  • [29] Identification of mesenchymal stem cells based on the passaging effect using machine learning approaches
    Jiang, Ching-Fen
    Hsu, Shan-Hui
    Sun, Yu-Man
    BIOPHYSICAL JOURNAL, 2023, 122 (03) : 463A - 463A
  • [30] Identification and Analysis of Risk Factors of Lower Back Pain Using Machine Learning Approaches
    Hasan, Kazi Amit
    Hasan, Md Al Mehedi
    PROCEEDINGS OF 2020 11TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (ICECE), 2020, : 129 - 132