Analysis of Feature Selection and Extraction Algorithm for Loan Data: A Big Data Approach

被引:0
|
作者
Attigeri, Girija [1 ]
Pai, Manohara M. M. [1 ]
Pai, Radhika M. [1 ]
机构
[1] Manipal Univ, Dept Informat & Commun Technol, Manipal Inst Technol, Manipal, Karnataka, India
关键词
Classification; Financial big data; Feature selection and extraction; Support Vector Machine; Logistic regression;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fraudulent activities in financial institutes can break the economic system of the country. These activities can be identified using clustering and classification algorithms. Effectiveness of these algorithms depend on quality of the input data. Moreover, financial data comes from various sources and forms such as financial statements, stakeholders activities and others. This data from various sources is very vast and unstructured big data. Hence, parallel distributed pre-processing is very significant to improve the quality of the data. Objective of this work is dimensionality reduction considering feature selection and extraction algorithm for large volume of financial data. In this paper an attempt is made to understand the implications of feature extraction and transformation algorithm using Principal Feature Analysis on the financial data. Effect of reduced dimension is studied on various classification algorithms for financial loan data. Parallel and distributed implementation is carried out on IBM Bluemix cloud platform with spark notebook. The results show that reduction of features has significantly improved execution time without compromising the accuracy.
引用
收藏
页码:2147 / 2151
页数:5
相关论文
共 50 条
  • [41] Data Feature Selection Methods on Distributed Big Data Processing Platforms
    Catalkaya, Mehmet Burak
    Kalipsiz, Oya
    Aktas, Mehmet S.
    Turgut, Umut Orcun
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 133 - 138
  • [42] Feature selection from microarray data : Genetic algorithm based approach
    Ram, Pintu Kumar
    Kuila, Pratyay
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (08): : 1599 - 1610
  • [43] Feature Extraction of Museum Big Data Text Information Based on the Similarity Mapping Algorithm
    Yang, Zhe
    Wang, Huiqin
    Tang, Qixuan
    Wang, Ting
    Wang, Shaowen
    Kong, Yulei
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [44] Feature selection based on an improved cat swarm optimization algorithm for big data classification
    Kuan-Cheng Lin
    Kai-Yuan Zhang
    Yi-Hung Huang
    Jason C. Hung
    Neil Yen
    [J]. The Journal of Supercomputing, 2016, 72 : 3210 - 3221
  • [45] Feature selection based on an improved cat swarm optimization algorithm for big data classification
    Lin, Kuan-Cheng
    Zhang, Kai-Yuan
    Huang, Yi-Hung
    Hung, Jason C.
    Yen, Neil
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (08): : 3210 - 3221
  • [46] Design of Adaptive Feature Extraction Algorithm Based on Fuzzy Classifier in Hyperspectral Imagery Classification for Big Data Analysis
    Rochac, Juan F. Ramirez
    Zhang, Nian
    Behera, Pradeep
    [J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 1046 - 1051
  • [47] Towards Ultrahigh Dimensional Feature Selection for Big Data
    Tan, Mingkui
    Tsang, Ivor W.
    Wang, Li
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 1371 - 1429
  • [48] Streaming feature selection algorithms for big data: A survey
    AlNuaimi, Noura
    Masud, Mohammad Mehedy
    Serhani, Mohamed Adel
    Zaki, Nazar
    [J]. APPLIED COMPUTING AND INFORMATICS, 2022, 18 (1/2) : 113 - 135
  • [49] Scalable and Accurate Online Feature Selection for Big Data
    Yu, Kui
    Wu, Xindong
    Ding, Wei
    Pei, Jian
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 11 (02)
  • [50] Distributed Evolutionary Feature Selection for Big Data Processing
    Bouaguel, Waad
    Ben NCir, Chiheb Eddine
    [J]. VIETNAM JOURNAL OF COMPUTER SCIENCE, 2022, 09 (03) : 313 - 332