Clustering High-Dimensional Stock Data using Data Mining Approach

被引:0
|
作者
Indriyanti, Dhea [1 ]
Dhini, Arian [1 ]
机构
[1] Univ Indonesia, Fac Engn, Dept Ind Engn, Depok, Indonesia
关键词
stock; high-dimensional data; clustering; EM; PCA;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent year, stock investor in Indonesia increased rapidly, so it is required to do analysis about the stock that helps the investor in their investment plan. Clustering is beneficial to select the appropriate stock fir investors. Unfortunately, stock prices keep varying from time to time. Consequently, it is not an easy work to select the stock for investment. In addition, stock price time series data are high dimensional data that influenced by many factors. In this study, high dimensional data are obtained by the time frame of each factor. Therefore, it is important to use a suitable technique to cluster high dimensional data. This paper presents High Dimensional Data Clustering (HDDC), a model-based clustering based on Gaussian Mixture Model, using the Expectation-Maximization (EM) algorithm. HDDC via EM algorithm gives a more robust result, and it possible to make an additional assumption. Moreover, this paper combines a high-dimensional clustering technique HDDC via EM algorithm and the most popular feature extraction technique Principal Component Analysis (PCA). This paper comparing methods of clustering technique HDDC and the combination between HDDC and PCA to know the most effective method which gives better result in clustering high-dimensional time series data. The 155 data features arc reduced to 7 principal components using PCA analysis. Despite PCA has increased the time efficiency of building the model, clustering technique HDDC via EM algorithm enables to handle the high-dimensional data better than the combination with PCA.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] An effective clustering scheme for high-dimensional data
    Xuansen He
    Fan He
    Yueping Fan
    Lingmin Jiang
    Runzong Liu
    Allam Maalla
    [J]. Multimedia Tools and Applications, 2024, 83 : 45001 - 45045
  • [32] An algorithm for high-dimensional traffic data clustering
    Zheng, Pengjun
    McDonald, Mike
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 59 - 68
  • [33] RETRACTED: An Ensemble Clustering Approach (Consensus Clustering) for High-Dimensional Data (Retracted Article)
    Yan, Jingdong
    Liu, Wuwei
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [34] Data Mining for High-Dimensional Measurement Systems
    Mikut, Ralf
    [J]. TM-TECHNISCHES MESSEN, 2010, 77 (10) : 524 - 529
  • [35] A new cell-based clustering method for high-dimensional data mining applications
    Chang, JW
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2005, 3681 : 391 - 397
  • [36] Subspace Clustering for High-Dimensional Data Using Cluster Structure Similarity
    Fatehi, Kavan
    Rezvani, Mohsen
    Fateh, Mansoor
    Pajoohan, Mohammad-Reza
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2018, 14 (03) : 38 - 55
  • [37] Supervised clustering of high-dimensional data using regularized mixture modeling
    Chang, Wennan
    Wan, Changlin
    Zang, Yong
    Zhang, Chi
    Cao, Sha
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [38] High-dimensional data clustering by using local affine/convex hulls
    Cevikalp, Hakan
    [J]. PATTERN RECOGNITION LETTERS, 2019, 128 : 427 - 432
  • [39] Visualization of high-dimensional data using an association of multidimensional scaling to clustering
    Naud, A
    [J]. 2004 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2004, : 252 - 255
  • [40] Mining the structural knowledge of high-dimensional medical data using Isomap
    Weng, S
    Zhang, C
    Lin, Z
    Zhang, X
    [J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2005, 43 (03) : 410 - 412