Analyzing Data Distribution for Dynamic Data Sets

被引:0
|
作者
Shi, Yong [1 ]
Kim, Sunpil [1 ]
机构
[1] Kennesaw State Univ, Dept Comp Sci, Kennesaw, GA 30144 USA
关键词
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
In this paper, we discuss the data distribution of data sets that change constantly. In our previous work [1], we analyze the change of the distribution in multi-dimensional data space, and propose an approach to processing the multi-dimensional data sets. Similarity search problems define the distances between data points and a given query point Q, efficiently and effectively selecting data points which are closest to Q. Clusters are subgroups of data points from a data set that are similar to each other within the same subgroup. In [1], we propose an approach to reconstruct clusters based on K nearest neighbor search results for dynamic data sets. However, in high dimensional spaces, for a given cluster, not all dimensions may be relevant to it, and natural clusters might not exist in the full data space. In this paper we extend our work in subspace area, and design an algorithm to detect the subclusters that are readjusted continuously when the data set changes and new query requests come. The reconstructed subclusters can help improve the performance of the future K nearest search process.
引用
收藏
页码:1046 / 1052
页数:7
相关论文
共 50 条
  • [21] Attribute reduction for dynamic data sets
    Wang, Feng
    Liang, Jiye
    Dang, Chuangyin
    [J]. APPLIED SOFT COMPUTING, 2013, 13 (01) : 676 - 689
  • [22] PixelMaps: A new visual data mining approach for analyzing large spatial data sets
    Keim, DA
    Panse, C
    Sips, M
    North, SC
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 565 - 568
  • [23] ROBUST METHODS - AN ALTERNATIVE APPROACH FOR ANALYZING DATA SETS CONTAINING INFLUENTIAL DATA POINTS
    CHATTERJEE, S
    WISEMAN, F
    [J]. DECISION SCIENCES, 1985, 16 (04) : 333 - 342
  • [24] Analyzing Dynamic Adversarial Training Data in the Limit
    Wallace, Eric
    Williams, Adina
    Jia, Robin
    Kiela, Douwe
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 202 - 217
  • [25] Analyzing data distribution on disk pools for dCache
    Halstenberg, S.
    Jung, C.
    Ressmann, D.
    [J]. 17TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP09), 2010, 219
  • [26] ANALYZING TREF DATA BY STOCKMAYERS BIVARIATE DISTRIBUTION
    SOARES, JBP
    HAMIELEC, AE
    [J]. MACROMOLECULAR THEORY AND SIMULATIONS, 1995, 4 (02) : 305 - 324
  • [27] Simplification of Node Position Data for Interactive Visualization of Dynamic Data Sets
    Rosen, Paul
    Popescu, Voicu
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2012, 18 (09) : 1537 - 1548
  • [28] A framework for analyzing the relationship between size and complexity of data sets
    dos Santos, Mateus Jose
    Brun, Andre Luiz
    Silva, Ronan Assumpcao
    [J]. REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2021, 13 (02): : 1 - 15
  • [29] PROBLEMS THAT CAN BE ENCOUNTERED WHEN ANALYZING PAIRED DATA SETS
    GILFILLAN, TC
    [J]. SOUTH AFRICAN STATISTICAL JOURNAL, 1983, 17 (02) : 181 - 182
  • [30] ASSOCIATION ANALYSIS TECHNIQUES FOR ANALYZING COMPLEX BIOLOGICAL DATA SETS
    Pandey, Gaurav
    Atluri, Gowtham
    Fang, Gang
    Gupta, Rohit
    Steinbach, Michael
    Kumar, Vipin
    [J]. 2009 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS 2009), 2009, : 172 - 175