Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

被引:6
|
作者
Kurt, Serkan [1 ]
Oz, Ersoy [2 ]
Askin, Oykum Esra [2 ]
Oz, Yeliz Yucel [3 ,4 ]
机构
[1] Yildiz Tech Univ, Fac Elect & Elect Engn, Dept Elect & Commun Engn, Istanbul, Turkey
[2] Yildiz Tech Univ, Fac Arts & Sci, Dept Stat, Istanbul, Turkey
[3] Istanbul Tech Univ, Mol Biol Biotechnol, Istanbul, Turkey
[4] Iontek AS, Istanbul, Turkey
来源
NEURAL COMPUTING & APPLICATIONS | 2018年 / 29卷 / 08期
关键词
DNA sequencing; Decision tree; Ensemble learning algorithms; Logistic regression;
D O I
10.1007/s00521-017-2960-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge of DNA sequences is indispensable for basic biological research. Many researchers use DNA sequencing for various purposes including molecular biology research and sequence comparison for individual identification. Automated DNA sequencing devices use four colored chromatograms or base-calling signals to indicate strength of hybridization for each base channel. Typically, relative strengths of peaks at each base location are used to quantify the quality and/or reliability of individual readings. However, assessment of overall quality of whole DNA trace files remains to be an open problem. Therefore, classification of raw DNA trace files as high or low quality is an important issue for efficient utilization of resources. In this study, we have used several supervised machine learning approaches, including logistic regression and ensemble decision trees, to identify high- or acceptable-quality chromatogram files and compared their prediction performances. In order to test and develop our ideas, we have used a public DNA trace repository consisting of 1626 high- and 631 low-quality files marked by our expert molecular biologist. Our results indicate that, although all of the methods tried offer comparable and acceptable performances, random forest decision tree algorithm with adapting boosting ensemble learning shows slightly higher prediction accuracy with as few as four features.
引用
收藏
页码:251 / 262
页数:12
相关论文
共 50 条
  • [1] Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches
    Serkan Kurt
    Ersoy Öz
    Öyküm Esra Aşkın
    Yeliz Yücel Öz
    Neural Computing and Applications, 2018, 29 : 251 - 262
  • [2] Retraction Note: Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches
    Serkan Kurt
    Ersoy Öz
    Öyküm Esra Aşkm
    Yeliz Yücel Öz
    Neural Computing and Applications, 2024, 36 (19) : 11679 - 11679
  • [3] Retraction note: Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches (Neural Computing and Applications, (2018), 29, 8, (251-262), 10.1007/s00521-017-2960-5)
    Kurt, Serkan
    Öz, Ersoy
    Aşkm, Öyküm Esra
    Öz, Yeliz Yücel
    Neural Computing and Applications, 2024,
  • [4] Classification of the Factors for Smoking Cessation Using Logistic Regression, Decision Tree & Neural Networks
    Siddiqui, Muhammad Aadil
    Khan, Abdul Samad
    Witjaksono, Gunawan
    2ND INTERNATIONAL CONFERENCE ON APPLIED PHOTONICS AND ELECTRONICS 2019 (INCAPE 2019), 2020, 2203
  • [5] Modelling Childbearing Desire: Comparison of Logistic Regression and Classification Tree Approaches
    Bagheri, Arezoo
    Saadati, Mahsa
    CRESCENT JOURNAL OF MEDICAL AND BIOLOGICAL SCIENCES, 2019, 6 (04): : 487 - 493
  • [6] An Assessment of Decision Tree based Classification and Regression Algorithms
    Pathak, Soham
    Mishra, Indivar
    Swetapadma, Aleena
    PROCEEDINGS OF THE 2018 3RD INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2018), 2018, : 92 - 95
  • [7] classLog: Logistic regression for the classification of genetic sequences
    Zeller, Michael A.
    Arendsee, Zebulun W.
    Smith, Gavin J. D.
    Anderson, Tavis K.
    FRONTIERS IN VIROLOGY, 2023, 3
  • [8] Using classification tree and logistic regression methods to diagnose myocardial infarction
    Tsien, CL
    Fraser, HSF
    Long, WJ
    Kennedy, RL
    MEDINFO '98 - 9TH WORLD CONGRESS ON MEDICAL INFORMATICS, PTS 1 AND 2, 1998, 52 : 493 - 497
  • [9] Predicting corporate financial distress based on integration of decision tree classification and logistic regression
    Chen, Mu-Yen
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (09) : 11261 - 11272
  • [10] Stochastic Modeling of Bridge Deterioration Using Classification Tree and Logistic Regression
    Chang, Minwoo
    Maguire, Marc
    Sun, Yan
    JOURNAL OF INFRASTRUCTURE SYSTEMS, 2019, 25 (01)