Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

被引:6
|
作者
Kurt, Serkan [1 ]
Oz, Ersoy [2 ]
Askin, Oykum Esra [2 ]
Oz, Yeliz Yucel [3 ,4 ]
机构
[1] Yildiz Tech Univ, Fac Elect & Elect Engn, Dept Elect & Commun Engn, Istanbul, Turkey
[2] Yildiz Tech Univ, Fac Arts & Sci, Dept Stat, Istanbul, Turkey
[3] Istanbul Tech Univ, Mol Biol Biotechnol, Istanbul, Turkey
[4] Iontek AS, Istanbul, Turkey
来源
NEURAL COMPUTING & APPLICATIONS | 2018年 / 29卷 / 08期
关键词
DNA sequencing; Decision tree; Ensemble learning algorithms; Logistic regression;
D O I
10.1007/s00521-017-2960-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge of DNA sequences is indispensable for basic biological research. Many researchers use DNA sequencing for various purposes including molecular biology research and sequence comparison for individual identification. Automated DNA sequencing devices use four colored chromatograms or base-calling signals to indicate strength of hybridization for each base channel. Typically, relative strengths of peaks at each base location are used to quantify the quality and/or reliability of individual readings. However, assessment of overall quality of whole DNA trace files remains to be an open problem. Therefore, classification of raw DNA trace files as high or low quality is an important issue for efficient utilization of resources. In this study, we have used several supervised machine learning approaches, including logistic regression and ensemble decision trees, to identify high- or acceptable-quality chromatogram files and compared their prediction performances. In order to test and develop our ideas, we have used a public DNA trace repository consisting of 1626 high- and 631 low-quality files marked by our expert molecular biologist. Our results indicate that, although all of the methods tried offer comparable and acceptable performances, random forest decision tree algorithm with adapting boosting ensemble learning shows slightly higher prediction accuracy with as few as four features.
引用
收藏
页码:251 / 262
页数:12
相关论文
共 50 条
  • [41] Cow Health Prediction Method Based on Logistic Regression and Decision Tree
    Nie, Jiaxin
    Fang, Jiandong
    Zhao, Yvdong
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 3712 - 3717
  • [42] Predictors of late asthmatic response - Logistic regression and classification tree analyses
    Avila, PC
    Segal, MR
    Wong, HH
    Boushey, HA
    Fahy, JV
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2000, 161 (06) : 2092 - 2095
  • [43] Application of Decision Tree Classification Algorithm in Quality Assessment of Distance Learning in Colleges
    Nan, Fang
    Li, Yanan
    Zhang, Jing
    Yin, Xuesong
    Cui, Xintong
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (02) : 1 - 9
  • [44] A Comparison of Logistic Regression and Classification Tree Analysis for Behavioural Case Linkage
    Tonkin, Matthew
    Woodhams, Jessica
    Bull, Ray
    Bond, John W.
    Santtila, Pekka
    JOURNAL OF INVESTIGATIVE PSYCHOLOGY AND OFFENDER PROFILING, 2012, 9 (03) : 235 - 258
  • [45] Soil erosion susceptibility assessment using logistic regression, decision tree and random forest: study on the Mayurakshi river basin of Eastern India
    Ghosh, Abhishek
    Maiti, Ramkrishna
    ENVIRONMENTAL EARTH SCIENCES, 2021, 80 (08)
  • [46] Soil erosion susceptibility assessment using logistic regression, decision tree and random forest: study on the Mayurakshi river basin of Eastern India
    Abhishek Ghosh
    Ramkrishna Maiti
    Environmental Earth Sciences, 2021, 80
  • [47] Prediction of Heart Disease using Decision Tree over Logistic Regression using Machine Learning with Improved Accuracy
    Raj, K. N. S. Shanmukha
    Thinakaran, K.
    CARDIOMETRY, 2022, (25): : 1514 - 1519
  • [48] Classification of Fire and Smoke Images using Decision Tree Algorithm in Comparison with Logistic Regression to Measure Accuracy, Precision, Recall, F-score
    Reddy, B. Haranadh
    Karthikeyan, P. R.
    2022 14TH INTERNATIONAL CONFERENCE ON MATHEMATICS, ACTUARIAL SCIENCE, COMPUTER SCIENCE AND STATISTICS (MACS), 2022,
  • [49] Software quality classification modeling using the SPRINT decision tree algorithm
    Khoshgoftaar, TM
    Seliya, N
    14TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, : 365 - 374
  • [50] DECISION TREE APPROACHES TO VOLTAGE SECURITY ASSESSMENT
    VANCUTSEM, T
    WEHENKEL, L
    PAVELLA, M
    HEILBRONN, B
    GOUBIN, M
    IEE PROCEEDINGS-C GENERATION TRANSMISSION AND DISTRIBUTION, 1993, 140 (03) : 189 - 198