High-Performance Commercial Data Mining: A Multistrategy Machine Learning Application

被引:0
|
作者
William H. Hsu
Michael Welge
Tom Redman
David Clutter
机构
[1] Kansas State University,Department of Computing and Information Sciences
[2] National Center for Supercomputing Applications (NCSA),Automated Learning Group
来源
关键词
constructive induction; scalable high-performance computing; real-world decision support applications; relevance determination; genetic algorithms; software development environments for knowledge discovery in databases (KDD);
D O I
暂无
中图分类号
学科分类号
摘要
We present an application of inductive concept learning and interactive visualization techniques to a large-scale commercial data mining project. This paper focuses on design and configuration of high-level optimization systems (wrappers) for relevance determination and constructive induction, and on integrating these wrappers with elicited knowledge on attribute relevance and synthesis. In particular, we discuss decision support issues for the application (cost prediction for automobile insurance markets in several states) and report experiments using D2K, a Java-based visual programming system for data mining and information visualization, and several commercial and research tools. We describe exploratory clustering, descriptive statistics, and supervised decision tree learning in this application, focusing on a parallel genetic algorithm (GA) system, Jenesis, which is used to implement relevance determination (attribute subset selection). Deployed on several high-performance network-of-workstation systems (Beowulf clusters), Jenesis achieves a linear speedup, due to a high degree of task parallelism. Its test set accuracy is significantly higher than that of decision tree inducers alone and is comparable to that of the best extant search-space based wrappers.
引用
收藏
页码:361 / 391
页数:30
相关论文
共 50 条
  • [21] Machine learning toward high-performance electrochemical sensors
    Giordano, Gabriela F.
    Ferreira, Larissa F.
    Bezerra, italo R. S.
    Barbosa, Julia A.
    Costa, Juliana N. Y.
    Pimentel, Gabriel J. C.
    Lima, Renato S.
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2023, 415 (18) : 3683 - 3692
  • [22] SparCML: High-Performance Sparse Communication for Machine Learning
    Renggli, Cedric
    Ashkboos, Saleh
    Aghagolzadeh, Mehdi
    Alistarh, Dan
    Hoefler, Torsten
    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [23] Machine learning for high-performance solar radiation prediction
    Tanoli, Irfan Khan
    Mehdi, Asqar
    Algarni, Abeer D.
    Fazal, Azra
    Khan, Talha Ahmed
    Ahmad, Sadique
    Ateya, Abdelhamied A.
    ENERGY REPORTS, 2024, 12 : 4794 - 4804
  • [24] MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing
    Nichols, Daniel
    Tomov, Nathalie-Sofia
    Betancourt, Frank
    Tomov, Stanimire
    Wong, Kwai
    Dongarra, Jack
    HIGH PERFORMANCE COMPUTING: ISC HIGH PERFORMANCE 2019 INTERNATIONAL WORKSHOPS, 2020, 11887 : 490 - 503
  • [25] Network Support for High-Performance Distributed Machine Learning
    Malandrino, Francesco
    Chiasserini, Carla Fabiana
    Molner, Nuria
    de la Oliva, Antonio
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (01) : 264 - 278
  • [26] Graph-Based Machine Learning Algorithm with Application in Data Mining
    Jin, Shimei
    Chen, Wei
    Han, Jiarui
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2017, : 269 - 272
  • [27] Application of data mining and machine learning in management accounting information system
    Zhang, Xiaofang
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2021, 24 (05): : 813 - 820
  • [28] Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA
    Yang, Aimin
    Zhang, Wei
    Wang, Jiahao
    Yang, Ke
    Han, Yang
    Zhang, Limin
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [29] Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation
    Fox, Geoffrey
    Glazier, James A.
    Kadupitiya, J. C. S.
    Jadhao, Vikram
    Kim, Minje
    Qiu, Judy
    Sluka, James P.
    Somogyi, Endre
    Marathe, Madhav
    Adiga, Abhijin
    Chen, Jiangzhuo
    Beckstein, Oliver
    Jha, Shantenu
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 422 - 429
  • [30] APHID: An architecture for private, high-performance integrated data mining
    Secretan, Jimmy
    Georgiopoulos, Michael
    Koufakou, Anna
    Cardona, Kel
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (07): : 891 - 904