iN6-methylat (5-step): identifying DNA N6-methyladenine sites in rice genome using continuous bag of nucleobases via Chou’s 5-step rule

被引:0
|
作者
Nguyen Quoc Khanh Le
机构
[1] Nanyang Technological University,Medical Humanities Research Cluster, School of Humanities
来源
Molecular Genetics and Genomics | 2019年 / 294卷
关键词
Skip gram; Continuous bag of words; DNA ; -methyladenine; Support vector machine; FastText; DNA replication;
D O I
暂无
中图分类号
学科分类号
摘要
DNA N6-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N6-methyladenine in the rice genome, higher than in most of the other species. Therefore, the identification of them has become a very important area of study, especially in biological research. Despite the few computational tools employed to address this problem, there still requires a lot of efforts to improve their performance results. In this study, we treat DNA sequences by the continuous bags of nucleobases, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to identify them. Our model which uses this hybrid approach could identify DNA N6-methyladenine sites with achieved a jackknife test sensitivity of 86.48%, specificity of 89.09%, accuracy of 87.78%, and MCC of 0.756. Compared to the state-of-the-art predictor as well as the other methods, our proposed model is able to yield superior performance in all the metrics. Moreover, this study provides a basis for further research that can enrich a field of applying natural language-processing techniques in biological sequences.
引用
收藏
页码:1173 / 1182
页数:9
相关论文
共 25 条
  • [21] i6mA-DNCP: Computational Identification of DNA N6-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features
    Kong, Liang
    Zhang, Lichao
    GENES, 2019, 10 (10)
  • [22] iPhosH-PseAAC: Identify Phosphohistidine Sites in Proteins by Blending Statistical Moments and Position Relative Features According to the Chou's 5-Step Rule and General Pseudo Amino Acid Composition
    Awais, Muhammad
    Hussain, Waqar
    Khan, Yaser Daanial
    Rasool, Nouman
    Khan, Sher Afzal
    Chou, Kuo-Chen
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (02) : 596 - 610
  • [23] Using Chou's 5-Step Rule to Evaluate the Stability of Tautomers: Susceptibility of 2-[(Phenylimino)-methyl]-cyclohexane-1,3-diones to Tautomerization Based on the Calculated Gibbs Free Energies
    Dobosz, Robert
    Mucko, Jan
    Gawinecki, Ryszard
    ENERGIES, 2020, 13 (01)
  • [24] iRSpot-SPI: Deep learning-based recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou's 5-step rule and pseudo components
    Khan, Zaheer Ullah
    Ali, Farman
    Khan, Izhar Ahmed
    Hussain, Yasir
    Pi, Dechang
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2019, 189 : 169 - 180
  • [25] Gly-LysPred: Identification of Lysine Glycation Sites in Protein Using Position Relative Features and Statistical Moments via Chou's 5 Step Rule
    Khanum, Shaheena
    Ashraf, Muhammad Adeel
    Karim, Asim
    Shoaib, Bilal
    Khan, Muhammad Adnan
    Naqvi, Rizwan Ali
    Siddique, Kamran
    Alswaitti, Mohammed
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (02): : 2165 - 2181