Evaluation of machine learning models that predict lncRNA subcellular localization

被引:0
|
作者
Miller, Jason R. [1 ,2 ]
Yi, Weijun [2 ]
Adjeroh, Donald A. [2 ]
机构
[1] Hood Coll, Dept Comp Sci & Informat Technol, Frederick, MD 21701 USA
[2] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
基金
美国国家科学基金会;
关键词
RNALOCATE; RESOURCE; GENCODE;
D O I
10.1093/nargab/lqae125
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The lncATLAS database quantifies the relative cytoplasmic versus nuclear abundance of long non-coding RNAs (lncRNAs) observed in 15 human cell lines. The literature describes several machine learning models trained and evaluated on these and similar datasets. These reports showed moderate performance, e.g. 72-74% accuracy, on test subsets of the data withheld from training. In all these reports, the datasets were filtered to include genes with extreme values while excluding genes with values in the middle range and the filters were applied prior to partitioning the data into training and testing subsets. Using several models and lncATLAS data, we show that this 'middle exclusion' protocol boosts performance metrics without boosting model performance on unfiltered test data. We show that various models achieve only about 60% accuracy when evaluated on unfiltered lncRNA data. We suggest that the problem of predicting lncRNA subcellular localization from nucleotide sequences is more challenging than currently perceived. We provide a basic model and evaluation procedure as a benchmark for future studies of this problem. Graphical Abstract
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Combining Machine Learning and Homology-Based Approaches to Accurately Predict Subcellular Localization in Arabidopsis
    Kaundal, Rakesh
    Saini, Reena
    Zhao, Patrick X.
    PLANT PHYSIOLOGY, 2010, 154 (01) : 36 - 54
  • [2] Performance Evaluation of Machine Learning Models to Predict Heart Attack
    Khan M.
    Husnain G.
    Ahmad W.
    Shaukat Z.
    Jan L.
    Ul Haq I.
    Ul Islam S.
    Ishtiaq A.
    Machine Graphics and Vision, 2023, 32 (01): : 99 - 114
  • [3] Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features
    Gudenas, Brian L.
    Wang, Liangjiang
    SCIENTIFIC REPORTS, 2018, 8
  • [4] Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features
    Brian L. Gudenas
    Liangjiang Wang
    Scientific Reports, 8
  • [5] Prediction of Protein Subcellular Localization using Machine Learning
    Upama, Paramita Basak
    Akhter, Shahin
    Bin Asad, Mohammad Imam Hasan
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [6] mRNALoc: a novel machine-learning based in-silico tool to predict mRNA subcellular localization
    Garg, Anjali
    Singhal, Neelja
    Kumar, Ravindra
    Kumar, Manish
    NUCLEIC ACIDS RESEARCH, 2020, 48 (W1) : W239 - W243
  • [7] Machine learning models to predict sweetness of molecules
    Goel, Mansi
    Sharma, Aditi
    Chilwal, Ayush Singh
    Kumari, Sakshi
    Kumar, Ayush
    Bagler, Ganesh
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 152
  • [8] Can Machine Learning Models Predict Inflation?
    Ivascu, Codrut
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BUSINESS EXCELLENCE, 2023, 17 (01): : 1748 - 1756
  • [9] MACHINE LEARNING MODELS TO PREDICT ASTHMA EXACERBATIONS
    Turcatel, Gianluca
    Xiao, Yi
    Caveney, Scott
    Gnacadja, Gilles
    Kim, Julie
    Molfino, Nestor
    CHEST, 2023, 164 (04) : 53A - 53A
  • [10] MSLP: mRNA subcellular localization predictor based on machine learning techniques
    Musleh, Saleh
    Islam, Mohammad Tariqul
    Qureshi, Rizwan
    Alajez, Nihad
    Alam, Tanvir
    BMC BIOINFORMATICS, 2023, 24 (01)