Novel K-Means Clustering-based Undersampling and Feature Selection for Drug Discovery Applications

被引:0
|
作者
Akondi, Vishnu Sripriya [1 ]
Menon, Vineetha [1 ]
Baudry, Jerome [2 ]
Whittle, Jana [2 ]
机构
[1] Univ Alabama, Dept Comp Sci, Huntsville, AL 35899 USA
[2] Univ Alabama, Dept Biol Sci, Huntsville, AL 35899 USA
关键词
Drug discovery; class imbalance; advanced machine learning techniques; ADORA2A; OPRK1;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Drug discovery refers to the process of identification of specific-disease causing proteins and underscores the research efforts to derive a new medication that targets these proteins. As such the drug discovery process entails significant challenges as it is time consuming, data intensive, and involves an expensive developmental process which demands rigorous lab testing with high rates of uncertainty that the given drug will succeed. Therefore, it highlights the crucial need for machine learning methods to automate and hasten the drug discovery pipeline for improved healthcare and assist clinicians to make informed decisions for in-vitro testing. However, most real-world biomedical datasets suffer from statistical ill-conditioning issues such as the class imbalance problem where the fewer class of potential drug candidate protein conformations are overshadowed by the larger protein-pool of non-drug candidates. Hence, this leads to erroneous conclusions when machine learning techniques are directly employed for data-learning and classification purposes. Therefore, this work takes a revolutionary stance to counter the class imbalance problem through advanced machine learning techniques that maximize the prediction rate of potential drug candidate molecular conformations for the target proteins ADORA2A and OPRK1 and subsequently reduces the failure rates of the drug discovery process. Experimental evaluation of the proposed machine learning methodologies further substantiates the effectiveness of our approach for drug discovery process.
引用
收藏
页码:2771 / 2778
页数:8
相关论文
共 50 条
  • [1] A Novel Stability Based Feature Selection Framework for k-means Clustering
    Mavroeidis, Dimitrios
    Marchiori, Elena
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2011, 6912 : 421 - 436
  • [2] Feature Selection Algorithm Based on K-means Clustering
    Tang, Xue
    Dong, Min
    Bi, Sheng
    Pei, Maofeng
    Cao, Dan
    Xie, Cheche
    Chi, Sunhuang
    [J]. 2017 IEEE 7TH ANNUAL INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (CYBER), 2017, : 1522 - 1527
  • [3] A novel SVR K-means clustering-based pollution assessment
    Yang, Jing
    [J]. Journal of Computational Information Systems, 2014, 10 (15): : 6381 - 6387
  • [4] Deterministic Feature Selection for k-Means Clustering
    Boutsidis, Christos
    Magdon-Ismail, Malik
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (09) : 6099 - 6110
  • [5] Unsupervised Bayesian feature selection based on k-means clustering
    Yan, Liu
    Yan, Peng
    [J]. IC-BNMT 2007: PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY, 2007, : 352 - 356
  • [6] K-means Clustering with Feature Selection for Stream Data
    Wang, Xiao-dong
    Chen, Rung-Ching
    Yan, Fei
    Hendry
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2018), 2018, : 453 - 456
  • [7] K-means clustering-based approach for face recognition
    Xie, Yinggang
    Kuang, Jiaoli
    Ye, Nan
    [J]. Journal of Information and Computational Science, 2010, 7 (01): : 169 - 175
  • [8] On K-means clustering-based approach for DDBSs design
    Amer, Ali A.
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [9] On K-means clustering-based approach for DDBSs design
    Ali A. Amer
    [J]. Journal of Big Data, 7
  • [10] A novel melanoma detection model: adapted K-means clustering-based segmentation process
    Sukanya, S. T.
    Jerine
    [J]. BIO-ALGORITHMS AND MED-SYSTEMS, 2021, 17 (02) : 103 - 118