Evaluation of machine learning models that predict lncRNA subcellular localization

被引:0
|
作者
Miller, Jason R. [1 ,2 ]
Yi, Weijun [2 ]
Adjeroh, Donald A. [2 ]
机构
[1] Hood Coll, Dept Comp Sci & Informat Technol, Frederick, MD 21701 USA
[2] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
基金
美国国家科学基金会;
关键词
RNALOCATE; RESOURCE; GENCODE;
D O I
10.1093/nargab/lqae125
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The lncATLAS database quantifies the relative cytoplasmic versus nuclear abundance of long non-coding RNAs (lncRNAs) observed in 15 human cell lines. The literature describes several machine learning models trained and evaluated on these and similar datasets. These reports showed moderate performance, e.g. 72-74% accuracy, on test subsets of the data withheld from training. In all these reports, the datasets were filtered to include genes with extreme values while excluding genes with values in the middle range and the filters were applied prior to partitioning the data into training and testing subsets. Using several models and lncATLAS data, we show that this 'middle exclusion' protocol boosts performance metrics without boosting model performance on unfiltered test data. We show that various models achieve only about 60% accuracy when evaluated on unfiltered lncRNA data. We suggest that the problem of predicting lncRNA subcellular localization from nucleotide sequences is more challenging than currently perceived. We provide a basic model and evaluation procedure as a benchmark for future studies of this problem. Graphical Abstract
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Learning to predict relapse in invasive ductal carcinomas based on the subcellular localization of junctional proteins
    Nasimeh Asgarian
    Xiuying Hu
    Zackie Aktary
    Kimberly Ann Chapman
    Le Lam
    Rajni Chibbar
    John Mackey
    Russ Greiner
    Manijeh Pasdar
    Breast Cancer Research and Treatment, 2010, 121 : 527 - 538
  • [32] Learning to predict relapse in invasive ductal carcinomas based on the subcellular localization of junctional proteins
    Asgarian, Nasimeh
    Hu, Xiuying
    Aktary, Zackie
    Chapman, Kimberly Ann
    Lam, Le
    Chibbar, Rajni
    Mackey, John
    Greiner, Russ
    Pasdar, Manijeh
    BREAST CANCER RESEARCH AND TREATMENT, 2010, 121 (02) : 527 - 538
  • [33] Machine learning models to predict onset of dementia: A label learning approach
    Nori, Vijay S.
    Hane, Christopher A.
    Crown, William H.
    Au, Rhoda
    Burke, William J.
    Sanghavi, Darshak M.
    Bleicher, Paul
    ALZHEIMERS & DEMENTIA-TRANSLATIONAL RESEARCH & CLINICAL INTERVENTIONS, 2019, 5 (01) : 918 - 925
  • [34] Evaluation of re-sampling methods on performance of machine learning models to predict landslide susceptibility
    Hassangavyar, Moslem Borji
    Damaneh, Hadi Eskandari
    Pham, Quoc Bao
    Linh, Nguyen Thi Thuy
    Tiefenbacher, John
    Bach, Quang-Vu
    GEOCARTO INTERNATIONAL, 2022, 37 (10) : 2772 - 2794
  • [35] An ensemble deep learning framework for multi-class LncRNA subcellular localization with innovative encoding strategy
    Hu, Wenxing
    Yue, Yan
    Yan, Ruomei
    Guan, Lixin
    Li, Mengshan
    BMC BIOLOGY, 2025, 23 (01)
  • [36] Predicting the Subcellular Localization of Human Proteins Using Machine Learning and Exploratory Data Analysis
    George K. Acquaah-Mensah
    Sonia M. Leach
    Chittibabu Guda
    Genomics Proteomics & Bioinformatics, 2006, (02) : 120 - 133
  • [37] Improving the prediction of disulfide bonds in Eukaryotes with machine learning methods and protein subcellular localization
    Savojardo, Castrense
    Fariselli, Piero
    Alhamdoosh, Monther
    Martelli, Pier Luigi
    Pierleoni, Andrea
    Casadio, Rita
    BIOINFORMATICS, 2011, 27 (16) : 2224 - 2230
  • [38] RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
    Yuan, Guo-Hua
    Wang, Ying
    Wang, Guang-Zhong
    Yang, Li
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
  • [39] SubLocEP: a novel ensemble predictor of subcellular localization of eukaryotic mRNA based on machine learning
    Li, Jing
    Zhang, Lichao
    He, Shida
    Guo, Fei
    Zou, Quan
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [40] LncSTPred: a predictive model of lncRNA subcellular localization and decipherment of the biological determinants influencing localization
    Hu, Si-Le
    Chen, Ying-Li
    Zhang, Lu-Qiang
    Bai, Hui
    Yang, Jia-Hong
    Li, Qian-Zhong
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2024, 11