Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

被引:6
|
作者
Kurt, Serkan [1 ]
Oz, Ersoy [2 ]
Askin, Oykum Esra [2 ]
Oz, Yeliz Yucel [3 ,4 ]
机构
[1] Yildiz Tech Univ, Fac Elect & Elect Engn, Dept Elect & Commun Engn, Istanbul, Turkey
[2] Yildiz Tech Univ, Fac Arts & Sci, Dept Stat, Istanbul, Turkey
[3] Istanbul Tech Univ, Mol Biol Biotechnol, Istanbul, Turkey
[4] Iontek AS, Istanbul, Turkey
来源
NEURAL COMPUTING & APPLICATIONS | 2018年 / 29卷 / 08期
关键词
DNA sequencing; Decision tree; Ensemble learning algorithms; Logistic regression;
D O I
10.1007/s00521-017-2960-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge of DNA sequences is indispensable for basic biological research. Many researchers use DNA sequencing for various purposes including molecular biology research and sequence comparison for individual identification. Automated DNA sequencing devices use four colored chromatograms or base-calling signals to indicate strength of hybridization for each base channel. Typically, relative strengths of peaks at each base location are used to quantify the quality and/or reliability of individual readings. However, assessment of overall quality of whole DNA trace files remains to be an open problem. Therefore, classification of raw DNA trace files as high or low quality is an important issue for efficient utilization of resources. In this study, we have used several supervised machine learning approaches, including logistic regression and ensemble decision trees, to identify high- or acceptable-quality chromatogram files and compared their prediction performances. In order to test and develop our ideas, we have used a public DNA trace repository consisting of 1626 high- and 631 low-quality files marked by our expert molecular biologist. Our results indicate that, although all of the methods tried offer comparable and acceptable performances, random forest decision tree algorithm with adapting boosting ensemble learning shows slightly higher prediction accuracy with as few as four features.
引用
收藏
页码:251 / 262
页数:12
相关论文
共 50 条
  • [31] EFFICIENT PREDICTION OF STROKE PATIENTS USING LOGISTIC REGRESSION ALGORITHM IN COMPARISON TO DECISION TREE ALGORITHM
    Mitra, Ritaban
    Rajendran, T.
    INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (03) : 5645 - 5651
  • [32] Comprehensive assessment of flood risk using the classification and regression tree method
    Zhonghui Ji
    Ning Li
    Wei Xie
    Jidong Wu
    Yang Zhou
    Stochastic Environmental Research and Risk Assessment, 2013, 27 : 1815 - 1828
  • [33] Comprehensive assessment of flood risk using the classification and regression tree method
    Ji, Zhonghui
    Li, Ning
    Xie, Wei
    Wu, Jidong
    Zhou, Yang
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2013, 27 (08) : 1815 - 1828
  • [34] Texture classification using kernel logistic regression
    Tambo, Asongu L.
    Mistry, Rajan B.
    Campbell, Jonathan M.
    Chan, Sherwin R.
    Hang, Xiyi
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 259 - 262
  • [35] Multiple Classification Using Logistic Regression Model
    Zou, Baoping
    INTERNET OF VEHICLES - TECHNOLOGIES AND SERVICES, 2016, 10036 : 238 - 243
  • [36] Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression
    Lemon, SC
    Roy, J
    Clark, MA
    Friedmann, PD
    Rakowski, W
    ANNALS OF BEHAVIORAL MEDICINE, 2003, 26 (03) : 172 - 181
  • [37] Liver Patient Classification using Logistic Regression
    Adil, Syed Hasan
    Ebrahim, Mansoor
    Raza, Kamran
    Ali, Syed Saad Azhar
    Hashmani, Manzoor Ahmed
    2018 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCOINS), 2018,
  • [38] QSAR modeling of nucleosides against amastigotes of Leishmania donovani using logistic regression and classification tree
    Oliveira, Kesley M. G.
    Takahata, Yuji
    QSAR & COMBINATORIAL SCIENCE, 2008, 27 (08): : 1020 - 1027
  • [39] Comparison of logistic regression and decision tree for customer churn prediction in Telecommunications
    Mand'ak, Jan
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON STRATEGIC MANAGEMENT AND ITS SUPPORT BY INFORMATION SYSTEMS (SMSIS), 2017, : 282 - 292
  • [40] Hybrid Decision Tree and Logistic Regression Classifier for Email Spam Detection
    Wijaya, Adi
    Bisri, Achmad
    PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING (ICITEE), 2016,