Distributed generative data mining

被引:0
|
作者
Ramos, Ruy [1 ]
Camacho, Rui [2 ]
机构
[1] LIACC, Rua Ceuta 118-6, P-4050190 Oporto, Portugal
[2] EEUP, P-4200465 Oporto, Portugal
关键词
data mining; parallel and distributed computing; inductive logic programming;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records.
引用
收藏
页码:307 / +
页数:3
相关论文
共 50 条
  • [1] DISTRIBUTED DATA MINING
    Fiolet, Valerie
    Toursel, Bernard
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2005, 6 (01): : 99 - 109
  • [2] Distributed data mining on the grid
    Cannataro, M
    Talia, D
    Trunfio, P
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2002, 18 (08): : 1101 - 1112
  • [3] Distributed data mining and agents
    da Silva, JC
    Giannella, C
    Bhargava, R
    Kargupta, H
    Klusch, M
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2005, 18 (07) : 791 - 807
  • [4] Distributed data mining: a survey
    Li Zeng
    Ling Li
    Lian Duan
    Kevin Lu
    Zhongzhi Shi
    Maoguang Wang
    Wenjuan Wu
    Ping Luo
    [J]. Information Technology and Management, 2012, 13 : 403 - 409
  • [5] Distributed data mining: a survey
    Zeng, Li
    Li, Ling
    Duan, Lian
    Lu, Kevin
    Shi, Zhongzhi
    Wang, Maoguang
    Wu, Wenjuan
    Luo, Ping
    [J]. INFORMATION TECHNOLOGY & MANAGEMENT, 2012, 13 (04): : 403 - 409
  • [6] Distributed data mining on the grid
    Jiang, WS
    Yu, JH
    [J]. PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2010 - 2014
  • [7] Preparation of Distributed Heterogeneous Data for Data Mining
    Batasova, Svetlana
    Efimova, Maria
    Kholod, Ivan
    Semenchenko, Alexey
    [J]. 2015 XVIII International Conference on Soft Computing and Measurements (SCM), 2015, : 205 - 207
  • [8] Toward Mining Capricious Data Streams: A Generative Approach
    He, Yi
    Wu, Baijun
    Wu, Di
    Beyazit, Ege
    Chen, Sheng
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) : 1228 - 1240
  • [9] Agent based distributed data mining
    Baik, SW
    Bala, J
    Cho, JS
    [J]. PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 42 - 45
  • [10] Distributed Big Advertiser Data Mining
    Bindra, Ashish
    Pokuri, Sreenivasulu
    Uppala, Krishna
    Teredesai, Ankur
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 914 - 914