Analysis of Dimensionality Reduction Techniques on Big Data

被引:461
|
作者
Reddy, G. Thippa [1 ]
Reddy, M. Praveen Kumar [1 ]
Lakshmanna, Kuruva [1 ]
Kaluri, Rajesh [1 ]
Rajput, Dharmendra Singh [1 ]
Srivastava, Gautam [2 ,3 ]
Baker, Thar [4 ]
机构
[1] VIT, Sch Infromat Technol & Engn, Vellore 632014, Tamil Nadu, India
[2] Brandon Univ, Dept Math & Comp Sci, Brandon, MB R7A 6A9, Canada
[3] China Med Univ, Res Ctr Interneural Comp, Shenyang 10122, Peoples R China
[4] Liverpool John Moores Univ, Dept Comp Sci, Liverpool L3 3AF, Merseyside, England
关键词
Dimensionality reduction; Principal component analysis; Machine learning algorithms; Support vector machines; Medical diagnostic imaging; Feature extraction; Cardiotocography dataset; dimensionality reduction; feature engineering; linear discriminant analysis; machine learning; principal component analysis; MACHINE; CLASSIFIER; DIAGNOSIS; SYSTEM;
D O I
10.1109/ACCESS.2020.2980942
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. Machine learning algorithms are used to uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms, Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and LDA.To further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results.
引用
收藏
页码:54776 / 54788
页数:13
相关论文
共 50 条
  • [21] Comparison of RFID Data Processing Using Dimensionality Reduction Techniques
    Anu, Maria, V
    Mala, G. S. Anandha
    Mathi, K.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 265 - 268
  • [22] A Survey on Dimensionality Reduction Techniques for Time-Series Data
    Ashraf, Mohsena
    Anowar, Farzana
    Setu, Jahanggir H.
    Chowdhury, Atiqul I.
    Ahmed, Eshtiak
    Islam, Ashraful
    Al-Mamun, Abdullah
    [J]. IEEE ACCESS, 2023, 11 : 42909 - 42923
  • [23] GCM Data Analysis Using Dimensionality Reduction
    Li, Zuoling
    Weng, Guirong
    [J]. ADVANCES IN COMPUTER SCIENCE AND EDUCATION, 2012, 140 : 217 - 222
  • [24] Big Data: A dimensionality Reduction and Attribute Selection using PCA for Diabetic Data bases
    Kumar, S. Santhosh
    [J]. RESEARCH JOURNAL OF PHARMACEUTICAL BIOLOGICAL AND CHEMICAL SCIENCES, 2015, 6 (02): : 1395 - 1401
  • [25] Data reduction techniques for highly imbalanced medicare Big Data
    Hancock, John T.
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Liang, Qianxin
    [J]. JOURNAL OF BIG DATA, 2024, 11 (01)
  • [26] Data reduction techniques for highly imbalanced medicare Big Data
    John T. Hancock
    Huanjing Wang
    Taghi M. Khoshgoftaar
    Qianxin Liang
    [J]. Journal of Big Data, 11
  • [27] Comparing Dimensionality Reduction Techniques
    Nick, William
    Shelton, Joseph
    Bullock, Gina
    Esterline, Albert
    Asamene, Kassahun
    [J]. IEEE SOUTHEASTCON 2015, 2015,
  • [28] Optimal Chemical Grouping and Sorbent Material Design by Data Analysis, Modeling and Dimensionality Reduction Techniques
    Onel, Melis
    Beykal, Burcu
    Wang, Meichen
    Grimm, Fabian A.
    Zhou, Lan
    Wright, Fred A.
    Phillips, Timothy D.
    Rusyn, Ivan
    Pistikopoulos, Efstratios N.
    [J]. 28TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, 2018, 43 : 421 - 426
  • [29] A Review on Dimensionality Reduction Techniques
    Huang, Xuan
    Wu, Lei
    Ye, Yinsong
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (10)
  • [30] Dimensionality Reduction Techniques for Visualizing Morphometric Data: Comparing Principal Component Analysis to Nonlinear Methods
    Trina Y. Du
    [J]. Evolutionary Biology, 2019, 46 : 106 - 121