Efficient SQL-querying method for data mining in large data bases

被引:0
|
作者
Son, NH [1 ]
机构
[1] Univ Warsaw, Inst Math, PL-02095 Warsaw, Poland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining can be understood as a process of extraction of knowledge hidden in very large data sets. Often data mining techniques (e.g. discretization or decision tree) are based on searching for an optimal partition of data with respect to some optimization criterion. In this paper, we investigate the problem of optimal binary partition of continuous attribute domain for large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of simple SQL queries like SELECT COUNT FROM... WHERE attribute BETWEEN (related to some interval of attribute values) necessary to construct such partitions. We assume that the answer time for such queries does not depend on the interval length. Using straightforward approach to optimal partition selection (with respect to a given measure), the number of necessary queries is of order O(N), where N is the number of preassumed partitions of the searching space. We show some properties of considered optimization measures, that allow to reduce the size of searching space. Moreover, we prove that using only O(log N) simple queries, one can construct the partition very close to optimal.
引用
收藏
页码:806 / 811
页数:6
相关论文
共 50 条
  • [31] A Data Cube Representation for Efficient Querying and Updating
    Phan-Luong, Viet
    2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 415 - 420
  • [32] Interface for Querying and Data Mining for NYC Yellow and Green Taxi Trip Data
    Aziz, Zahid
    Robila, Stefan
    2019 IEEE Long Island Systems, Applications and Technology Conference, LISAT 2019, 2019,
  • [33] An Efficient Method for Battlefield Information Data Stream Mining
    Wang, Ning
    Wang, Tao
    INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 1, PROCEEDINGS, 2009, : 723 - 725
  • [34] Efficient Large Graph Pattern Mining for Big Data in the Cloud
    Chen, Chun-Chieh
    Lee, Kuan-Wei
    Chang, Chih-Chieh
    Yang, De-Nian
    Chen, Ming-Syan
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [35] Efficient algorithms for mining outliers from large data sets
    Ramaswamy, S
    Rastogi, R
    Shim, K
    SIGMOD RECORD, 2000, 29 (02) : 427 - 438
  • [36] An interactive SQL relational interface for querying main-memory data structures
    Marios Fragkoulis
    Diomidis Spinellis
    Panos Louridas
    Computing, 2015, 97 : 1141 - 1164
  • [37] An interactive SQL relational interface for querying main-memory data structures
    Fragkoulis, Marios
    Spinellis, Diomidis
    Louridas, Panos
    COMPUTING, 2015, 97 (12) : 1141 - 1164
  • [38] Advanced Studying on Microsoft SQL Server Data Mining
    Ren, Zhijun
    2010 INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATION AND 2010 ASIA-PACIFIC CONFERENCE ON INFORMATION TECHNOLOGY AND OCEAN ENGINEERING: CICC-ITOE 2010, PROCEEDINGS, 2010, : 87 - 89
  • [39] On NIS-Apriori Based Data Mining in SQL
    Sakai, Hiroshi
    Liu, Chenxi
    Zhu, Xiaoxin
    Nakata, Michinori
    ROUGH SETS, (IJCRS 2016), 2016, 9920 : 514 - 524
  • [40] Building Data Mining Applications with SQL Server 2005
    Wang, Dongyun
    Ren, Zhijun
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 10859 - 10862