Evaluation of machine learning models that predict lncRNA subcellular localization

被引:0
|
作者
Miller, Jason R. [1 ,2 ]
Yi, Weijun [2 ]
Adjeroh, Donald A. [2 ]
机构
[1] Hood Coll, Dept Comp Sci & Informat Technol, Frederick, MD 21701 USA
[2] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
基金
美国国家科学基金会;
关键词
RNALOCATE; RESOURCE; GENCODE;
D O I
10.1093/nargab/lqae125
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The lncATLAS database quantifies the relative cytoplasmic versus nuclear abundance of long non-coding RNAs (lncRNAs) observed in 15 human cell lines. The literature describes several machine learning models trained and evaluated on these and similar datasets. These reports showed moderate performance, e.g. 72-74% accuracy, on test subsets of the data withheld from training. In all these reports, the datasets were filtered to include genes with extreme values while excluding genes with values in the middle range and the filters were applied prior to partitioning the data into training and testing subsets. Using several models and lncATLAS data, we show that this 'middle exclusion' protocol boosts performance metrics without boosting model performance on unfiltered test data. We show that various models achieve only about 60% accuracy when evaluated on unfiltered lncRNA data. We suggest that the problem of predicting lncRNA subcellular localization from nucleotide sequences is more challenging than currently perceived. We provide a basic model and evaluation procedure as a benchmark for future studies of this problem. Graphical Abstract
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Machine Learning Models to Classify and Predict Depression in College Students
    Iparraguirre-Villanueva, Orlando
    Paulino-Moreno, Cleoge
    Epifanía-Huerta, Andrés
    Torres-Ceclén, Carmen
    International Journal of Interactive Mobile Technologies, 2024, 18 (14) : 148 - 163
  • [42] Machine Learning and Statistical Models to Predict Postpartum Hemorrhage Reply
    Venkatesh, Kartik K.
    Jelovsek, J. Eric
    OBSTETRICS AND GYNECOLOGY, 2020, 136 (01): : 195 - 195
  • [43] Incorporating Radiomics into Machine Learning Models to Predict Outcomes of Neuroblastoma
    Liu, Gengbo
    Poon, Mini
    Zapala, Matthew A.
    Temple, William C.
    Vo, Kieuhoa T.
    Matthay, Kathrine K.
    Mitra, Debasis
    Seo, Youngho
    JOURNAL OF DIGITAL IMAGING, 2022, 35 (03) : 605 - 612
  • [44] Machine learning Models to Predict COVID-19 Cases
    Alshabana, Ghadah
    Tran, Thao
    Saadati, Marjan
    George, Michael Thompson
    Chitimalla, Ashritha
    2022 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS), 2022, : 223 - 229
  • [45] Machine Learning Models to Predict Childhood and Adolescent Obesity: A Review
    Colmenarejo, Gonzalo
    NUTRIENTS, 2020, 12 (08) : 1 - 31
  • [46] Using machine learning models to predict falls in hospitalised adults
    Jahandideh, S.
    Hutchinson, A. F.
    Bucknall, T. K.
    Considine, J.
    Driscoll, A.
    Manias, E.
    Phillips, N. M.
    Rasmussen, B.
    Vos, N.
    Hutchinson, A. M.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 187
  • [47] Machine Learning Models to Predict Students’ Study Path Selection
    Dirin A.
    Saballe C.A.
    International Journal of Interactive Mobile Technologies, 2022, 16 (01) : 158 - 183
  • [48] Incorporating Radiomics into Machine Learning Models to Predict Outcomes of Neuroblastoma
    Gengbo Liu
    Mini Poon
    Matthew A. Zapala
    William C. Temple
    Kieuhoa T. Vo
    Kathrine K. Matthay
    Debasis Mitra
    Youngho Seo
    Journal of Digital Imaging, 2022, 35 : 605 - 612
  • [49] On the Application of Machine Learning Models to Assess and Predict Software Reusability
    Yeow, Matthew Yit Hang
    Chong, Chun Yong
    Lim, Mei Kuan
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING TECHNIQUES FOR SOFTWARE QUALITY EVALUATION, MALTESQUE 2022, 2022, : 17 - 22
  • [50] Machine learning models to predict nitrate concentration in a river basin
    Dorado-Guerra, Diana Yaritza
    Corzo-Perez, Gerald
    Paredes-Arquiola, Javier
    Perez-Martin, Miguel Angel
    ENVIRONMENTAL RESEARCH COMMUNICATIONS, 2022, 4 (12):