Efficient SQL-querying method for data mining in large data bases

被引:0
|
作者
Son, NH [1 ]
机构
[1] Univ Warsaw, Inst Math, PL-02095 Warsaw, Poland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining can be understood as a process of extraction of knowledge hidden in very large data sets. Often data mining techniques (e.g. discretization or decision tree) are based on searching for an optimal partition of data with respect to some optimization criterion. In this paper, we investigate the problem of optimal binary partition of continuous attribute domain for large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of simple SQL queries like SELECT COUNT FROM... WHERE attribute BETWEEN (related to some interval of attribute values) necessary to construct such partitions. We assume that the answer time for such queries does not depend on the interval length. Using straightforward approach to optimal partition selection (with respect to a given measure), the number of necessary queries is of order O(N), where N is the number of preassumed partitions of the searching space. We show some properties of considered optimization measures, that allow to reduce the size of searching space. Moreover, we prove that using only O(log N) simple queries, one can construct the partition very close to optimal.
引用
收藏
页码:806 / 811
页数:6
相关论文
共 50 条
  • [41] Declarative data mining using SQL3
    Jamil, HM
    DATABASE SUPPORT FOR DATA MINING APPLICATIONS: DISCOVERING KNOWLEDGE WITH INDUCTIVE QUERIES, 2004, 2682 : 52 - 75
  • [42] A Method of SQL Processing Data in NoSQL
    Pan, Wumin
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFT COMPUTING IN INFORMATION COMMUNICATION TECHNOLOGY, 2014, : 213 - 215
  • [43] Data organization and access for efficient data mining
    Dunkel, B
    Soparkar, N
    15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 522 - 529
  • [44] DISTRIBUTION OF ACCESS AND DATA IN LARGE DATA BASES.
    Le Lann, G.
    Lehon, A.
    Negaret, R.
    American Society of Mechanical Engineers (Paper), 1976, : 94 - 98
  • [45] The Origin and Development of Data Mining with Large Data
    Ma, Hongyu
    Zhang, Guiyun
    MECHANICAL COMPONENTS AND CONTROL ENGINEERING III, 2014, 668-669 : 1331 - 1334
  • [46] Wavelet based data mining and querying in network security databases
    Liu, W
    Duan, HX
    Ren, P
    Li, X
    Wu, JP
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 178 - 182
  • [47] ITISS: an efficient framework for querying big temporal data
    Chen, Zhongpu
    Yao, Bin
    Wang, Zhi-Jie
    Zhang, Wei
    Zheng, Kai
    Kalnis, Panos
    Tang, Feilong
    GEOINFORMATICA, 2020, 24 (01) : 27 - 59
  • [48] Fuzzy data mining for querying and retrieval of research archival information
    Smith, MH
    Rubin, S
    Trajkovic, L
    1998 CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1998, : 140 - 145
  • [49] An Efficient Data Streams Mining Method for Wireless Sensor Network's Data Aggregation
    Wang, Bensheng
    Wang, Tao
    Mikou, Noufissa
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL III, 2009, : 1016 - +
  • [50] Data mining proxy: Serving large number of users for efficient frequent itemset mining
    Li, ZH
    Yu, JX
    Lu, HJ
    Xu, YB
    Liu, GM
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 458 - 463