Novel K-Means Clustering-based Undersampling and Feature Selection for Drug Discovery Applications

被引：0

作者：

Akondi, Vishnu Sripriya ^{[1
]}

Menon, Vineetha ^{[1
]}

Baudry, Jerome ^{[2
]}

Whittle, Jana ^{[2
]}

机构：

[1] Univ Alabama, Dept Comp Sci, Huntsville, AL 35899 USA

[2] Univ Alabama, Dept Biol Sci, Huntsville, AL 35899 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2019年

关键词：

Drug discovery; class imbalance; advanced machine learning techniques; ADORA2A; OPRK1;

D O I：

暂无

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Drug discovery refers to the process of identification of specific-disease causing proteins and underscores the research efforts to derive a new medication that targets these proteins. As such the drug discovery process entails significant challenges as it is time consuming, data intensive, and involves an expensive developmental process which demands rigorous lab testing with high rates of uncertainty that the given drug will succeed. Therefore, it highlights the crucial need for machine learning methods to automate and hasten the drug discovery pipeline for improved healthcare and assist clinicians to make informed decisions for in-vitro testing. However, most real-world biomedical datasets suffer from statistical ill-conditioning issues such as the class imbalance problem where the fewer class of potential drug candidate protein conformations are overshadowed by the larger protein-pool of non-drug candidates. Hence, this leads to erroneous conclusions when machine learning techniques are directly employed for data-learning and classification purposes. Therefore, this work takes a revolutionary stance to counter the class imbalance problem through advanced machine learning techniques that maximize the prediction rate of potential drug candidate molecular conformations for the target proteins ADORA2A and OPRK1 and subsequently reduces the failure rates of the drug discovery process. Experimental evaluation of the proposed machine learning methodologies further substantiates the effectiveness of our approach for drug discovery process.

引用

页码：2771 / 2778

页数：8

共 50 条

[1] A Novel Stability Based Feature Selection Framework for k-means Clustering
Mavroeidis, Dimitrios
Marchiori, Elena
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2011, 6912 : 421 - 436
[2] Feature Selection Algorithm Based on K-means Clustering
Tang, Xue
Dong, Min
Bi, Sheng
Pei, Maofeng
Cao, Dan
Xie, Cheche
Chi, Sunhuang
[J]. 2017 IEEE 7TH ANNUAL INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (CYBER), 2017, : 1522 - 1527
[3] A novel SVR K-means clustering-based pollution assessment
Yang, Jing
[J]. Journal of Computational Information Systems, 2014, 10 (15): : 6381 - 6387
[4] Deterministic Feature Selection for k-Means Clustering
Boutsidis, Christos
Magdon-Ismail, Malik
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (09) : 6099 - 6110
[5] Unsupervised Bayesian feature selection based on k-means clustering
Yan, Liu
Yan, Peng
[J]. IC-BNMT 2007: PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY, 2007, : 352 - 356
[6] K-means Clustering with Feature Selection for Stream Data
Wang, Xiao-dong
Chen, Rung-Ching
Yan, Fei
Hendry
[J]. 2018 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2018), 2018, : 453 - 456
[7] K-means clustering-based approach for face recognition
Xie, Yinggang
Kuang, Jiaoli
Ye, Nan
[J]. Journal of Information and Computational Science, 2010, 7 (01): : 169 - 175
[8] On K-means clustering-based approach for DDBSs design
Amer, Ali A.
[J]. JOURNAL OF BIG DATA, 2020, 7 (01)
[9] On K-means clustering-based approach for DDBSs design
Ali A. Amer
[J]. Journal of Big Data, 7
[10] A novel melanoma detection model: adapted K-means clustering-based segmentation process
Sukanya, S. T.
Jerine
[J]. BIO-ALGORITHMS AND MED-SYSTEMS, 2021, 17 (02) : 103 - 118

← 1 2 3 4 5 →