Analysis of Dimensionality Reduction Techniques on Big Data

被引:461
|
作者
Reddy, G. Thippa [1 ]
Reddy, M. Praveen Kumar [1 ]
Lakshmanna, Kuruva [1 ]
Kaluri, Rajesh [1 ]
Rajput, Dharmendra Singh [1 ]
Srivastava, Gautam [2 ,3 ]
Baker, Thar [4 ]
机构
[1] VIT, Sch Infromat Technol & Engn, Vellore 632014, Tamil Nadu, India
[2] Brandon Univ, Dept Math & Comp Sci, Brandon, MB R7A 6A9, Canada
[3] China Med Univ, Res Ctr Interneural Comp, Shenyang 10122, Peoples R China
[4] Liverpool John Moores Univ, Dept Comp Sci, Liverpool L3 3AF, Merseyside, England
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Dimensionality reduction; Principal component analysis; Machine learning algorithms; Support vector machines; Medical diagnostic imaging; Feature extraction; Cardiotocography dataset; dimensionality reduction; feature engineering; linear discriminant analysis; machine learning; principal component analysis; MACHINE; CLASSIFIER; DIAGNOSIS; SYSTEM;
D O I
10.1109/ACCESS.2020.2980942
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. Machine learning algorithms are used to uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms, Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and LDA.To further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results.
引用
收藏
页码:54776 / 54788
页数:13
相关论文
共 50 条
  • [1] Supervised dimensionality reduction for big data
    Vogelstein, Joshua T.
    Bridgeford, Eric W.
    Tang, Minh
    Zheng, Da
    Douville, Christopher
    Burns, Randal
    Maggioni, Mauro
    [J]. NATURE COMMUNICATIONS, 2021, 12 (01)
  • [2] Supervised dimensionality reduction for big data
    Joshua T. Vogelstein
    Eric W. Bridgeford
    Minh Tang
    Da Zheng
    Christopher Douville
    Randal Burns
    Mauro Maggioni
    [J]. Nature Communications, 12
  • [3] Dimensionality reduction techniques for data exploration
    Tsai, Flora S.
    Chan, Kap Luk
    [J]. 2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1568 - 1572
  • [4] Big data dimensionality reduction techniques in IoT: review, applications and open research challenges
    Ridhima Rani
    Meenu Khurana
    Ajay Kumar
    Neeraj Kumar
    [J]. Cluster Computing, 2022, 25 : 4027 - 4049
  • [5] Big data dimensionality reduction techniques in IoT: review, applications and open research challenges
    Rani, Ridhima
    Khurana, Meenu
    Kumar, Ajay
    Kumar, Neeraj
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (06): : 4027 - 4049
  • [6] Analysis of Unsupervised Dimensionality Reduction Techniques
    Kumar, Ch. Aswani
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2009, 6 (02) : 217 - 227
  • [7] Dimensionality reduction techniques for iot based data
    Tomar, Dimpal
    Tomar, Pradeep
    [J]. Recent Advances in Computer Science and Communications, 2021, 14 (03): : 724 - 735
  • [8] A Holistic Approach for Distributed Dimensionality Reduction of Big Data
    Kuang, Liwei
    Yang, Laurence T.
    Chen, Jinjun
    Hao, Fei
    Luo, Changqing
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (02) : 506 - 518
  • [9] Dimensionality Reduction for Noise Filtering of Big Data Sets
    Mahe, E.
    [J]. JOURNAL OF MOLECULAR DIAGNOSTICS, 2020, 22 (11): : S85 - S85
  • [10] Special issue on dimensionality reduction for visual big data
    Pang, Yanwei
    Shao, Ling
    [J]. NEUROCOMPUTING, 2016, 173 : 125 - 126