Performance controlled data reduction for knowledge discovery in distributed databases

被引:0
|
作者
Vucetic, S [1 ]
Obradovic, Z [1 ]
机构
[1] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99164 USA
关键词
data reduction; data compression; sensitivity analysis; distributed databases; neural networks; learning curve;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of data reduction is to obtain a compact representation of a large data set to facilitate repeated use of non-redundant information with complex and slow learning algorithms and to allow efficient data transfer and storage. For a user-controllable allowed accuracy loss we propose an effective data reduction procedure based on guided sampling for identifying a minimal size representative subset, followed by a model-sensitivity analysis for determining an appropriate compression level for each attribute. Experiments were performed on 3 large data sets and, depending on an allowed accuracy loss margin ranging from 1% to 5% of the ideal generalization, the achieved compression rates ranged between 95 and 12,500 times. These results indicate that transferring reduced data sets from multiple locations to a centralized site for an efficient and accurate knowledge discovery might often be possible in practice.
引用
收藏
页码:29 / 39
页数:11
相关论文
共 50 条
  • [21] Knowledge discovery in mining truck condition and performance databases
    Ataman, IK
    Golosinski, TS
    [J]. PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL MINING CONGRESS AND EXHIBITION OF TURKEY, 2001, : 231 - 235
  • [22] KNOWLEDGE DISCOVERY IN DATABASES
    PIATETSKYSHAPIRO, G
    [J]. IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1991, 6 (05): : 74 - 76
  • [23] Knowledge discovery in databases
    Düsing, R
    [J]. WIRTSCHAFTSINFORMATIK, 2000, 42 (01): : 74 - 75
  • [24] Knowledge discovery in databases
    Norton, MJ
    [J]. LIBRARY TRENDS, 1999, 48 (01) : 9 - 21
  • [25] Distributed Knowledge Discovery with Non Linear Dimensionality Reduction
    Magdalinos, Panagis
    Vazirgiannis, Michalis
    Valsamou, Dialecti
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 14 - 26
  • [26] Big data in macroeconomic forecasting: On the usefulness of knowledge discovery in databases
    Brandl, Bernd
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS 2004, 2004, : 30 - 35
  • [27] Data preparation process for construction knowledge generation through knowledge discovery in databases
    Soibelman, L
    Kim, H
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2002, 16 (01) : 39 - 48
  • [28] Integration of a data mining tool and a spreadsheet in knowledge discovery in databases
    Lee, JS
    Yoo, JP
    Yoo, S
    [J]. DECISION SCIENCES INSTITUTE 1998 PROCEEDINGS, VOLS 1-3, 1998, : 988 - 989
  • [29] Knowledge discovery in oceanographic databases: Issues of complications in data sources
    Ladner, R
    Petry, F
    [J]. OCEANS 2002 MTS/IEEE CONFERENCE & EXHIBITION, VOLS 1-4, CONFERENCE PROCEEDINGS, 2002, : 1264 - 1270
  • [30] Revisable knowledge discovery in databases
    Narayanan, A
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 1996, 11 (02) : 75 - 96