Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

被引:6
|
作者
Kurt, Serkan [1 ]
Oz, Ersoy [2 ]
Askin, Oykum Esra [2 ]
Oz, Yeliz Yucel [3 ,4 ]
机构
[1] Yildiz Tech Univ, Fac Elect & Elect Engn, Dept Elect & Commun Engn, Istanbul, Turkey
[2] Yildiz Tech Univ, Fac Arts & Sci, Dept Stat, Istanbul, Turkey
[3] Istanbul Tech Univ, Mol Biol Biotechnol, Istanbul, Turkey
[4] Iontek AS, Istanbul, Turkey
来源
NEURAL COMPUTING & APPLICATIONS | 2018年 / 29卷 / 08期
关键词
DNA sequencing; Decision tree; Ensemble learning algorithms; Logistic regression;
D O I
10.1007/s00521-017-2960-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge of DNA sequences is indispensable for basic biological research. Many researchers use DNA sequencing for various purposes including molecular biology research and sequence comparison for individual identification. Automated DNA sequencing devices use four colored chromatograms or base-calling signals to indicate strength of hybridization for each base channel. Typically, relative strengths of peaks at each base location are used to quantify the quality and/or reliability of individual readings. However, assessment of overall quality of whole DNA trace files remains to be an open problem. Therefore, classification of raw DNA trace files as high or low quality is an important issue for efficient utilization of resources. In this study, we have used several supervised machine learning approaches, including logistic regression and ensemble decision trees, to identify high- or acceptable-quality chromatogram files and compared their prediction performances. In order to test and develop our ideas, we have used a public DNA trace repository consisting of 1626 high- and 631 low-quality files marked by our expert molecular biologist. Our results indicate that, although all of the methods tried offer comparable and acceptable performances, random forest decision tree algorithm with adapting boosting ensemble learning shows slightly higher prediction accuracy with as few as four features.
引用
收藏
页码:251 / 262
页数:12
相关论文
共 50 条
  • [21] Decision Tree of Occupational Lung Cancer Using Classification and Regression Analysis
    Kim, Tae-Woo
    Koh, Dong-Hee
    Park, Chung-Yill
    SAFETY AND HEALTH AT WORK, 2010, 1 (02) : 140 - 148
  • [22] Prediction of standing tree defect proportion using logistic regression and ordered decision thresholds
    Westfall, James A.
    CANADIAN JOURNAL OF FOREST RESEARCH-REVUE CANADIENNE DE RECHERCHE FORESTIERE, 2013, 43 (12): : 1085 - 1091
  • [23] Cash Holdings Prediction Using Decision Tree Algorithms and Comparison with Logistic Regression Model
    Wu, Hsu-Che
    Chen, Jen-Hsiang
    Wang, Pei-Wen
    CYBERNETICS AND SYSTEMS, 2021, 52 (08) : 689 - 704
  • [24] Unweighted Fusion in Microphone Forensics using a Decision Tree and Linear Logistic Regression Models
    Kraetzer, Christian
    Schott, Maik
    Dittmann, Jana
    MM&SEC'09: PROCEEDINGS OF THE 2009 ACM SIGMM MULTIMEDIA AND SECURITY WORKSHOP, 2009, : 49 - 56
  • [25] Predicting the Characteristics of People Living in the South USA Using Logistic Regression and Decision Tree
    Serban, Ramona
    Kupraszewicz, Andrzej
    Hu, Gongzhu
    2011 9TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2011,
  • [26] Prediction of Renal Function by Urinary Lead and Cadmium——Based on Classification Decision Tree and Logistic Regression Model
    LI Yang
    WANG Qing Yao
    TIAN Qing Hua
    AN Qi
    YANG Yu Tong
    ZHANG Jia Chen
    LI Shuang Jing
    ZHOU Han
    LIANG Yun Fen
    SHEN Wei Tong
    MU Li Na
    LEI Li Jian
    Biomedical and Environmental Sciences, 2024, 37 (03) : 331 - 335
  • [27] Prediction Accuracy Analysis with Logistic Regression and CART Decision Tree
    Zhang, Xudong
    Wang, Di
    Qian, Ying
    Yang, Yingming
    FOURTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2019, 11198
  • [28] Credit card churn forecasting by logistic regression and decision tree
    Nie, Guangli
    Wei Rowe
    Zhang, Lingling
    Tian, Yingjie
    Shi, Yong
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (12) : 15273 - 15285
  • [29] A comparison of the classification performance of two approaches to polychotomous logistic regression
    Louw, N
    Le Roux, NJ
    Steel, SJ
    SOUTH AFRICAN STATISTICAL JOURNAL, 1998, 32 (02) : 145 - 168
  • [30] A comparison between penalized logistic regressions and classification tree approaches
    Behzadi, Mostafa
    Yunus, Rossita Mohamad
    Mohamad, Saharuddin Bin
    Hamzah, Nor Aishah
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2025,