Efficient SQL-querying method for data mining in large data bases

被引:0
|
作者
Son, NH [1 ]
机构
[1] Univ Warsaw, Inst Math, PL-02095 Warsaw, Poland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining can be understood as a process of extraction of knowledge hidden in very large data sets. Often data mining techniques (e.g. discretization or decision tree) are based on searching for an optimal partition of data with respect to some optimization criterion. In this paper, we investigate the problem of optimal binary partition of continuous attribute domain for large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of simple SQL queries like SELECT COUNT FROM... WHERE attribute BETWEEN (related to some interval of attribute values) necessary to construct such partitions. We assume that the answer time for such queries does not depend on the interval length. Using straightforward approach to optimal partition selection (with respect to a given measure), the number of necessary queries is of order O(N), where N is the number of preassumed partitions of the searching space. We show some properties of considered optimization measures, that allow to reduce the size of searching space. Moreover, we prove that using only O(log N) simple queries, one can construct the partition very close to optimal.
引用
收藏
页码:806 / 811
页数:6
相关论文
共 50 条
  • [1] Indexing for Large Scale Data Querying based on Spark SQL
    Cui, Yi
    Li, Guoqiang
    Cheng, Hao
    Wang, Daoyuan
    2017 IEEE 14TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2017), 2017, : 103 - 108
  • [2] Querying the data warehouse with the SQL procedure SELECT statement
    Lafler, KP
    PROCEEDINGS OF THE TWENTY-THIRD ANNUAL SAS USERS GROUP INTERNATIONAL CONFERENCE, 1998, : 245 - 249
  • [3] Querying Deep Web Data Bases without Accessing to Data
    Boughammoura, Radhouane
    Omri, Mohamed Nazih
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 597 - 603
  • [4] On a fuzzy querying and data mining interface
    Kacprzyk, J
    Zadrozny, S
    KYBERNETIKA, 2000, 36 (06) : 657 - 670
  • [5] Data Bases, the Base for Data Mining
    Buchsbaum, Christian
    Hoehler-Schlimm, Sabine
    Rehme, Silke
    DATA MINING IN CRYSTALLOGRAPHY, 2010, 134 : 37 - 58
  • [6] An Efficient Reduction Method for Data Mining
    Hou, Lifen
    Wang, Yonghao
    Liu, Xinyu
    2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND APPLICATIONS (CSA), 2013, : 825 - 828
  • [7] Integrating data mining with SQL databases: OLE DB for data mining
    Netz, A
    Chaudhuri, S
    Fayyad, U
    Bernhardt, J
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 379 - 387
  • [8] A novel data structure for efficient representation of large data sets in data mining
    Pai, Radhika M.
    Ananthanarayana, V. S.
    2006 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, VOLS 1 AND 2, 2007, : 533 - 538
  • [9] An efficient SQL-based querying method to RDF schemata
    Falkowski, Maciej
    Jedrzejek, Czeslaw
    CONTROL AND CYBERNETICS, 2009, 38 (01): : 193 - 213
  • [10] SQL & data mining, & genetic programming
    Connolly, B
    DR DOBBS JOURNAL, 2004, 29 (04): : 34 - 39