Performance controlled data reduction for knowledge discovery in distributed databases

被引:0
|
作者
Vucetic, S [1 ]
Obradovic, Z [1 ]
机构
[1] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99164 USA
关键词
data reduction; data compression; sensitivity analysis; distributed databases; neural networks; learning curve;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of data reduction is to obtain a compact representation of a large data set to facilitate repeated use of non-redundant information with complex and slow learning algorithms and to allow efficient data transfer and storage. For a user-controllable allowed accuracy loss we propose an effective data reduction procedure based on guided sampling for identifying a minimal size representative subset, followed by a model-sensitivity analysis for determining an appropriate compression level for each attribute. Experiments were performed on 3 large data sets and, depending on an allowed accuracy loss margin ranging from 1% to 5% of the ideal generalization, the achieved compression rates ranged between 95 and 12,500 times. These results indicate that transferring reduced data sets from multiple locations to a centralized site for an efficient and accurate knowledge discovery might often be possible in practice.
引用
收藏
页码:29 / 39
页数:11
相关论文
共 50 条
  • [41] A distributed evolutionary classifier for knowledge discovery in data mining
    Tan, KC
    Yu, Q
    Lee, TH
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2005, 35 (02): : 131 - 142
  • [42] A practical knowledge discovery process for distributed data mining
    Liu, JB
    Han, J
    [J]. INTELLIGENT SYSTEMS, 2002, : 11 - 16
  • [43] Role of domain knowledge in knowledge discovery in databases
    Owrang, M.Mehdi
    [J]. Microcomputer Applications, 1997, 16 (01): : 11 - 18
  • [44] Data mining and knowledge discovery in databases: Applications in astronomy and planetary science
    Fayyad, UM
    [J]. PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, 1996, : 1590 - 1592
  • [45] Protecting data through 'perturbation' techniques: The impact on knowledge discovery in databases
    Wilson, Rick L.
    Rosen, Peter A.
    [J]. Journal of Database Management, 2003, 14 (01) : 14 - 26
  • [46] Medical data mining - Experience of knowledge discovery in two clinical databases
    Liu, CCH
    Chiang, IJ
    Li, YC
    [J]. AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 1085 - 1085
  • [47] Data mining and clustering in chemical process databases for monitoring and knowledge discovery
    Thomas, Michael C.
    Zhu, Wenbo
    Romagnoli, Jose A.
    [J]. JOURNAL OF PROCESS CONTROL, 2018, 67 : 160 - 175
  • [48] A BIOMEDICAL KNOWLEDGE DISCOVERY IN DATABASES DESIGN TOOL - TURNING DATA INTO INFORMATION
    Pfeifer, B.
    Tejada, M. M.
    Kugler, K.
    Osl, M.
    Netzer, M.
    Seger, M.
    Modre-Osprian, R.
    Schreier, G.
    Tilg, B.
    [J]. EHEALTH2008 - MEDICAL INFORMATICS MEETS EHEALTH, 2008, : 23 - 28
  • [49] Logical calculi for knowledge discovery in databases
    Rauch, J
    [J]. PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1997, 1263 : 47 - 57
  • [50] An XML approach to knowledge discovery in databases
    Kotásek, P
    Zendulka, J
    [J]. KNOWLEDGE-BASED SOFTWARE ENGINEERING, 2000, 62 : 141 - 148