A contemporary feature selection and classification framework for imbalanced biomedical datasets

被引:16
|
作者
Bikku, Thulasi [1 ]
Nandam, Sambasiva Rao [2 ]
Akepogu, Ananda Rao [3 ]
机构
[1] Vignans Nirula Inst Technol & Sci Women, Dept CSE, Palakaluru, AP, India
[2] RITW, Hyderabad, Telangana, India
[3] JNTUCEA, Acad & Planning, Anantapuramu, India
关键词
Biomedical data; Document clustering; Document classification; Bioinformatics; User recommended system;
D O I
10.1016/j.eij.2018.03.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the availability of a large number of biomedical documents in the PubMed and Medline repositories, it is difficult to analyze, predict and interpret the document's information using the traditional document clustering and classification models. Traditional document clustering and classification models were failed to analyze the document sets based on the user's keyword and MESH terms. Due to the large number of feature sets, conventional models, such as SVM, Neural Networks, Multi-nominal naive bayes have been used as feature classification, where additional text filtering measures are typically used as feature selection process. Also, as the size of the document's increases, it becomes difficult to find the outliers using the document's features and MESH terms. Biomedical document clustering and classification is one of the essential machine learning models for the knowledge extraction process of the real-time user recommended systems. In this paper, we developed a novel biomedical document feature clustering and classification model as a user recommended system for large document sets using the Hadoop framework. In this model, a novel gene feature clustering with ensemble document classification was implemented on biomedical repositories (PubMed and Medline) using the MapReduce framework. Experimental results show that the proposed model has a high computational cluster quality rate and true positive classification rate compared to traditional document clustering and classification models. (C) 2018 Production and hosting by Elsevier B.V. on behalf of Faculty of Computers and Information, Cairo University.
引用
收藏
页码:191 / 198
页数:8
相关论文
共 50 条
  • [1] FEATURE SELECTION FOR DATASETS WITH IMBALANCED CLASS DISTRIBUTIONS
    Kamal, Abu H. M.
    Zhu, Xingquan
    Pandya, Abhijit
    Hsu, Sam
    Narayanan, Ramaswamy
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2010, 20 (02) : 113 - 137
  • [2] Feature selection and classification of imbalanced datasets Application to PET images of children with autistic spectrum disorders
    Duchesnay, Edouard
    Cachia, Arnaud
    Boddaert, Nathalie
    Chabane, Nadia
    Mangin, Jean-Franois
    Martinot, Jean-Luc
    Brunelle, Francis
    Zilbovicius, Monica
    NEUROIMAGE, 2011, 57 (03) : 1003 - 1014
  • [3] An Improved Weighted ELM with Hierarchical Feature Representation for Imbalanced Biomedical Datasets
    Zhang, Liyuan
    Zhao, Jiashi
    Yang, Huamin
    Jiang, Zhengang
    Shi, Weili
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2018), PT I, 2018, 11061 : 276 - 283
  • [4] A Multivariate Feature Selection Framework for High Dimensional Biomedical Data Classification
    Alzubaidi, Abeer
    Cosma, Georgina
    2017 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2017, : 59 - 66
  • [5] FEATURE SELECTION FOR IMBALANCED DATASETS BASED ON IMPROVED GENETIC ALGORITHM
    Du, Limin
    Xu, Yang
    Jin, Liuqian
    DECISION MAKING AND SOFT COMPUTING, 2014, 9 : 119 - 124
  • [6] Optimal Feature Selection for Imbalanced Text Classification
    Khurana A.
    Verma O.P.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [7] Addressing Overlapping in Classification with Imbalanced Datasets: A First Multi-objective Approach for Feature and Instance Selection
    Fernandez, Alberto
    Jose del Jesus, Maria
    Herrera, Francisco
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2015, 2015, 9375 : 36 - 44
  • [8] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
    Maldonado, Sebastian
    Lopez, Julio
    APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
  • [9] Applications of Feature Selection Techniques on Large Biomedical Datasets
    Ewen, Nicolas
    Abdou, Tamer
    Bener, Ayse
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 543 - 548
  • [10] A Novel Classifier-independent Feature Selection Algorithm for Imbalanced Datasets
    Zhu, Quanyin
    Cao, Suqun
    SNPD 2009: 10TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCES, NETWORKING AND PARALLEL DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 77 - +