Evaluation of machine learning models that predict lncRNA subcellular localization

被引:0
|
作者
Miller, Jason R. [1 ,2 ]
Yi, Weijun [2 ]
Adjeroh, Donald A. [2 ]
机构
[1] Hood Coll, Dept Comp Sci & Informat Technol, Frederick, MD 21701 USA
[2] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
基金
美国国家科学基金会;
关键词
RNALOCATE; RESOURCE; GENCODE;
D O I
10.1093/nargab/lqae125
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The lncATLAS database quantifies the relative cytoplasmic versus nuclear abundance of long non-coding RNAs (lncRNAs) observed in 15 human cell lines. The literature describes several machine learning models trained and evaluated on these and similar datasets. These reports showed moderate performance, e.g. 72-74% accuracy, on test subsets of the data withheld from training. In all these reports, the datasets were filtered to include genes with extreme values while excluding genes with values in the middle range and the filters were applied prior to partitioning the data into training and testing subsets. Using several models and lncATLAS data, we show that this 'middle exclusion' protocol boosts performance metrics without boosting model performance on unfiltered test data. We show that various models achieve only about 60% accuracy when evaluated on unfiltered lncRNA data. We suggest that the problem of predicting lncRNA subcellular localization from nucleotide sequences is more challenging than currently perceived. We provide a basic model and evaluation procedure as a benchmark for future studies of this problem. Graphical Abstract
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Machine Learning and Statistical Models to Predict Postpartum Hemorrhage
    Venkatesh, Kartik K.
    Strauss, Robert A.
    Grotegut, Chad A.
    Heine, R. Philip
    Chescheir, Nancy C.
    Stringer, Jeffrey S. A.
    Stamilio, David M.
    Menard, Katherine M.
    Jelovsek, J. Eric
    OBSTETRICS AND GYNECOLOGY, 2020, 135 (04): : 935 - 944
  • [22] Benchmarking machine learning models to predict corporate bankruptcy
    Alanis, Emmanuel
    Chava, Sudheer
    Shah, Agam
    JOURNAL OF CREDIT RISK, 2023, 19 (02): : 77 - 110
  • [23] Hypocalcemia and Machine Learning Models to Predict Trauma Mortality
    Limon, David
    Moreira, Alvaro
    Myers, John C.
    Jenkins, Donald H.
    Braverman, Maxwell
    Barry, Lauran A.
    Lumbard, Derek C.
    Smith, Alison
    Nicholson, Susannah E.
    JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2024, 239 (05) : S550 - S550
  • [24] Machine learning models predict selinexor tolerability and efficacy
    Artstein, Y.
    Walker, C.
    Yang, F.
    Van Domelen, D.
    Borochov, D.
    Mercier, I.
    Shah, J.
    Shacham, S.
    Landesman, Y.
    Tang, S.
    Shacham, E.
    ANNALS OF ONCOLOGY, 2020, 31 : S277 - S277
  • [25] Different Machine Learning Models to predict dropouts in MOOCs
    Kashyap, Avinash
    Nayak, Ashalatha
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 80 - 85
  • [26] Recent Advancement in Predicting Subcellular Localization of Mycobacterial Protein with Machine Learning Methods
    Li, Shi-Hao
    Guan, Zheng-Xing
    Zhang, Dan
    Zhang, Zi-Mei
    Huang, Jian
    Yang, Wuritu
    Lin, Hao
    MEDICINAL CHEMISTRY, 2020, 16 (05) : 605 - 619
  • [27] Prediction of subcellular localization of proteins using machine learning techniques and evolutionary information
    Raghava, G. P. S.
    AMINO ACIDS, 2007, 33 (03) : X - XI
  • [28] Machine Learning Models to Predict Multiclass Protein Classifications
    Parikh, Yash
    Abdelfattah, Eman
    2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 300 - 304
  • [29] Pruned Machine Learning Models to Predict Aqueous Solubility
    Perryman, Alexander L.
    Inoyama, Daigo
    Patel, Jimmy S.
    Ekins, Sean
    Freundlich, Joel S.
    ACS OMEGA, 2020, 5 (27): : 16562 - 16567
  • [30] Comparison of Machine Learning Models to Predict Twitter Buzz
    Parikh, Yash
    Abdelfattah, Eman
    2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 69 - 73