Identification of small open reading frames in plant lncRNA using class-imbalance learning

被引:6
|
作者
Zhao, Siyuan [1 ]
Meng, Jun [1 ]
Wekesa, Jael Sanyanda [2 ]
Luan, Yushi [3 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Liaoning, Peoples R China
[2] Jomo Kenyatta Univ Agr & Technol, Dept Informat Technol, Nairobi 6200000200, Kenya
[3] Dalian Univ Technol, Sch Bioengn, Dalian 116024, Liaoning, Peoples R China
关键词
Class-imbalance learning; Feature selection; Hybrid resampling; Ensemble learning; sORFs; lncRNA; SMOTE;
D O I
10.1016/j.compbiomed.2023.106773
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recently, small open reading frames (sORFs) in long noncoding RNA (lncRNA) have been demonstrated to encode small peptides that can help study the mechanisms of growth and development in organisms. Since machine learning-based computational methods are less costly compared with biological experiments, they can be used to identify sORFs and provide a basis for biological experiments. However, few computational methods and data resources have been exploited for identifying sORFs in plant lncRNA. Besides, machine learning models produce underperforming classifiers when faced with a class-imbalance problem. In this study, an alternative method called SMOTE based on weighted cosine distance (WCDSMOTE) which enables interaction with feature selection is put forward to synthesize minority class samples and weighted edited nearest neighbor (WENN) is applied to clean up majority class samples, thus, hybrid sampling WCDSMOTE-ENN is proposed to deal with imbalanced datasets with the multi-angle feature. A heterogeneous classifier ensemble is introduced to complete the classification task. Therefore, a novel computational method that is based on class-imbalance learning to identify the sORFs with coding potential in plant lncRNA (sORFplnc) is presented. Experimental results manifest that sORFplnc outperforms existing computational methods in identifying sORFs with coding potential. We anticipate that the proposed work can be a reference for relevant research and contribute to agriculture and biomedicine.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Conserved functions of small open reading frames
    Isabel Lokody
    Nature Reviews Genetics, 2013, 14 (10) : 679 - 679
  • [42] HOW TO DEAL WITH SMALL OPEN READING FRAMES?
    Wanczyk, Malgorzata
    Blazej, Pawel
    Mackiewicz, Pawel
    Cebrat, Stanislaw
    BIOINFORMATICS: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS, 2012, : 246 - 250
  • [43] Small open reading frames (sORFs): Driving big improvements in plant development and quality
    Dong, Kui
    Shan, Chaofan
    Wen, Dongyu
    Cui, Zifan
    Cao, Jun
    PLANT STRESS, 2025, 15
  • [44] Learning from class-imbalance and heterogeneous data for 30-day hospital readmission
    Du, Guodong
    Zhang, Jia
    Li, Shaozi
    Li, Candong
    NEUROCOMPUTING, 2021, 420 : 27 - 35
  • [45] SWSEL: Sliding Window-based Selective Ensemble Learning for class-imbalance problems
    Dai, Qi
    Liu, Jian-wei
    Yang, Jia-Peng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 121
  • [46] Identification of Novel Translated Small Open Reading Frames in Escherichia coli Using Complementary Ribosome Profiling Approaches
    Stringer, Anne
    Smith, Carol
    Mangano, Kyle
    Wade, Joseph T.
    JOURNAL OF BACTERIOLOGY, 2022, 204 (01)
  • [47] Identification of small open reading frames with high coding potential in moss Physcomitrella patens
    Arapidi, G. P.
    Fesenko, I. A.
    Babalyan, K. A.
    Zakiev, E. R.
    Seredina, A. V.
    Chazigaleeva, R. A.
    Kostrukova, E. S.
    Kovalchuk, S. I.
    Anikanov, N.
    Semashko, T. A.
    Govorun, V. M.
    Ivanov, V. T.
    FEBS JOURNAL, 2014, 281 : 286 - 286
  • [48] Genome-wide identification of coding small open reading frames: The unknown transcriptome
    Li H.-M.
    Hu C.-S.
    Bai L.
    Journal of Shanghai Jiaotong University (Science), 2014, 19 (06) : 663 - 668
  • [49] Genome-Wide Identification of Coding Small Open Reading Frames:The Unknown Transcriptome
    李红梅
    胡传圣
    白玲
    JournalofShanghaiJiaotongUniversity(Science), 2014, 19 (06) : 663 - 668
  • [50] Automatically Categorizing Construction Accident Narratives Using the Deep-Learning Model with a Class-Imbalance Treatment Technique
    Shuang, Qing
    Liu, Xishan
    Wang, Zhaojing
    Xu, Xinxin
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2024, 150 (09)