Efficient SQL-querying method for data mining in large data bases

被引：0

作者：

Son, NH ^{[1
]}

机构：

[1] Univ Warsaw, Inst Math, PL-02095 Warsaw, Poland

来源：

IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2 | 1999年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data mining can be understood as a process of extraction of knowledge hidden in very large data sets. Often data mining techniques (e.g. discretization or decision tree) are based on searching for an optimal partition of data with respect to some optimization criterion. In this paper, we investigate the problem of optimal binary partition of continuous attribute domain for large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of simple SQL queries like SELECT COUNT FROM... WHERE attribute BETWEEN (related to some interval of attribute values) necessary to construct such partitions. We assume that the answer time for such queries does not depend on the interval length. Using straightforward approach to optimal partition selection (with respect to a given measure), the number of necessary queries is of order O(N), where N is the number of preassumed partitions of the searching space. We show some properties of considered optimization measures, that allow to reduce the size of searching space. Moreover, we prove that using only O(log N) simple queries, one can construct the partition very close to optimal.

引用

页码：806 / 811

页数：6

共 50 条

[21] An efficient clustering method of data mining for high-dimensional data
Chang, JW
Kang, HM
8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 273 - 278
[22] Sampling for Information and Structure Preservation When Mining Large Data Bases
Kuri-Morales, Angel
Lozano, Alexis
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 174 - 183
[23] EFFICIENT MANAGEMENT OF TRANSITIVE RELATIONSHIPS IN LARGE DATA AND KNOWLEDGE BASES
AGRAWAL, R
BORGIDA, A
JAGADISH, HV
PROCEEDINGS OF THE 1989 ACM SIGMOD INTERNATIONAL CONFERENCE ON THE MANAGEMENT OF DATA, 1989, 18 : 253 - 262
[24] EFFICIENT PROCESSING METHODS FOR LARGE CTR DATA-BASES
CUMMINS, WF
PARRISH, CP
BULLETIN OF THE AMERICAN PHYSICAL SOCIETY, 1977, 22 (09): : 1197 - 1197
[25] Distributed Data Mining by Means of SQL Enhancement
Gorawski, Marcin
Pluciennik, Ewa
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008 WORKSHOPS, 2008, 5333 : 34 - 35
[26] ATLaS: A native extension of SQL for data mining
Wang, HX
Zaniolo, C
PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2003, : 130 - 141
[27] Clusterwise data mining within a fuzzy querying interface
Kacprzyk, J
Owsinski, JW
Zadrozny, S
10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 1239 - 1242
[28] Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Ordonez, Carlos
Chen, Zhibo
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (04) : 678 - 691
[29] Geometric querying of time-dependent data for data mining in molecular dynamics
Sourina, O
Korolev, N
2004 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2004, : 351 - 355
[30] DATA COMPRESSION OF LARGE DOCUMENT DATA BASES
HEAPS, HS
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1975, 15 (01): : 32 - 39

← 1 2 3 4 5 →