Efficient SQL-querying method for data mining in large data bases

被引:0
|
作者
Son, NH [1 ]
机构
[1] Univ Warsaw, Inst Math, PL-02095 Warsaw, Poland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining can be understood as a process of extraction of knowledge hidden in very large data sets. Often data mining techniques (e.g. discretization or decision tree) are based on searching for an optimal partition of data with respect to some optimization criterion. In this paper, we investigate the problem of optimal binary partition of continuous attribute domain for large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of simple SQL queries like SELECT COUNT FROM... WHERE attribute BETWEEN (related to some interval of attribute values) necessary to construct such partitions. We assume that the answer time for such queries does not depend on the interval length. Using straightforward approach to optimal partition selection (with respect to a given measure), the number of necessary queries is of order O(N), where N is the number of preassumed partitions of the searching space. We show some properties of considered optimization measures, that allow to reduce the size of searching space. Moreover, we prove that using only O(log N) simple queries, one can construct the partition very close to optimal.
引用
收藏
页码:806 / 811
页数:6
相关论文
共 50 条
  • [21] An efficient clustering method of data mining for high-dimensional data
    Chang, JW
    Kang, HM
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 273 - 278
  • [22] Sampling for Information and Structure Preservation When Mining Large Data Bases
    Kuri-Morales, Angel
    Lozano, Alexis
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 174 - 183
  • [23] EFFICIENT MANAGEMENT OF TRANSITIVE RELATIONSHIPS IN LARGE DATA AND KNOWLEDGE BASES
    AGRAWAL, R
    BORGIDA, A
    JAGADISH, HV
    PROCEEDINGS OF THE 1989 ACM SIGMOD INTERNATIONAL CONFERENCE ON THE MANAGEMENT OF DATA, 1989, 18 : 253 - 262
  • [24] EFFICIENT PROCESSING METHODS FOR LARGE CTR DATA-BASES
    CUMMINS, WF
    PARRISH, CP
    BULLETIN OF THE AMERICAN PHYSICAL SOCIETY, 1977, 22 (09): : 1197 - 1197
  • [25] Distributed Data Mining by Means of SQL Enhancement
    Gorawski, Marcin
    Pluciennik, Ewa
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008 WORKSHOPS, 2008, 5333 : 34 - 35
  • [26] ATLaS: A native extension of SQL for data mining
    Wang, HX
    Zaniolo, C
    PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2003, : 130 - 141
  • [27] Clusterwise data mining within a fuzzy querying interface
    Kacprzyk, J
    Owsinski, JW
    Zadrozny, S
    10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 1239 - 1242
  • [28] Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
    Ordonez, Carlos
    Chen, Zhibo
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (04) : 678 - 691
  • [29] Geometric querying of time-dependent data for data mining in molecular dynamics
    Sourina, O
    Korolev, N
    2004 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2004, : 351 - 355
  • [30] DATA COMPRESSION OF LARGE DOCUMENT DATA BASES
    HEAPS, HS
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1975, 15 (01): : 32 - 39