High-Performance Commercial Data Mining: A Multistrategy Machine Learning Application

被引:0
|
作者
William H. Hsu
Michael Welge
Tom Redman
David Clutter
机构
[1] Kansas State University,Department of Computing and Information Sciences
[2] National Center for Supercomputing Applications (NCSA),Automated Learning Group
来源
关键词
constructive induction; scalable high-performance computing; real-world decision support applications; relevance determination; genetic algorithms; software development environments for knowledge discovery in databases (KDD);
D O I
暂无
中图分类号
学科分类号
摘要
We present an application of inductive concept learning and interactive visualization techniques to a large-scale commercial data mining project. This paper focuses on design and configuration of high-level optimization systems (wrappers) for relevance determination and constructive induction, and on integrating these wrappers with elicited knowledge on attribute relevance and synthesis. In particular, we discuss decision support issues for the application (cost prediction for automobile insurance markets in several states) and report experiments using D2K, a Java-based visual programming system for data mining and information visualization, and several commercial and research tools. We describe exploratory clustering, descriptive statistics, and supervised decision tree learning in this application, focusing on a parallel genetic algorithm (GA) system, Jenesis, which is used to implement relevance determination (attribute subset selection). Deployed on several high-performance network-of-workstation systems (Beowulf clusters), Jenesis achieves a linear speedup, due to a high degree of task parallelism. Its test set accuracy is significantly higher than that of decision tree inducers alone and is comparable to that of the best extant search-space based wrappers.
引用
收藏
页码:361 / 391
页数:30
相关论文
共 50 条
  • [1] High-performance commercial data mining: A multistrategy machine learning application
    Hsu, WH
    Welge, M
    Redman, T
    Clutter, D
    DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (04) : 361 - 391
  • [2] High-performance data mining
    IBM, United States
    IBM Data Manag. Mag., 2009, 3
  • [3] The Application of Machine Learning Algorithms in Data Mining
    Zhang, Wei
    2016 INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING AND COMMUNICATIONS TECHNOLOGY (IECT 2016), 2016, : 521 - 527
  • [4] High-performance data mining system
    Yaginuma, Y
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2000, 36 (02): : 201 - 210
  • [5] Application of machine learning in astronomical spectral data mining
    Ting Zhang
    Hailong Zhang
    Yazhou Zhang
    Xu Du
    Wenna Cai
    Han Wu
    Yuyue Jiao
    Wanqiong Wang
    Jie Wang
    Xinchen Ye
    Jia Li
    Astronomical Techniques and Instruments, 2025, 2 (02) : 73 - 87
  • [6] High-performance data mining with intelligent SSD
    Yong-Yeon Jo
    Sang-Wook Kim
    Sung-Woo Cho
    Duck-Ho Bae
    Hyunok Oh
    Cluster Computing, 2017, 20 : 1155 - 1166
  • [7] High-performance data mining with intelligent SSD
    Jo, Yong-Yeon
    Kim, Sang-Wook
    Cho, Sung-Woo
    Bae, Duck-Ho
    Oh, Hyunok
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (02): : 1155 - 1166
  • [8] A general framework of high-performance machine learning algorithms: application in structural mechanics
    Markou, George
    Bakas, Nikolaos P.
    Chatzichristofis, Savvas A.
    Papadrakakis, Manolis
    COMPUTATIONAL MECHANICS, 2024, 73 (04) : 705 - 729
  • [9] A general framework of high-performance machine learning algorithms: application in structural mechanics
    George Markou
    Nikolaos P. Bakas
    Savvas A. Chatzichristofis
    Manolis Papadrakakis
    Computational Mechanics, 2024, 73 : 705 - 729
  • [10] Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student
    Bashir Khan Yousafzai
    Maqsood Hayat
    Sher Afzal
    Education and Information Technologies, 2020, 25 : 4677 - 4697