Warmr: a data mining tool for chemical data

被引:0
|
作者
Ross D. King
Ashwin Srinivasan
Luc Dehaspe
机构
[1] University of Wales,Department of Computer Science
[2] University of Oxford,Computing Laboratory
[3] PharmaDM,undefined
关键词
carcinogenesis; chemical structure; inductive logic programming; machine learning; predictive toxicology;
D O I
暂无
中图分类号
学科分类号
摘要
Data mining techniques are becoming increasingly important in chemistry as databases become too large to examine manually. Data mining methods from the field of Inductive Logic Programming (ILP) have potential advantages for structural chemical data. In this paper we present Warmr, the first ILP data mining algorithm to be applied to chemoinformatic data. We illustrate the value of Warmr by applying it to a well studied database of chemical compounds tested for carcinogenicity in rodents. Data mining was used to find all frequent substructures in the database, and knowledge of these frequent substructures is shown to add value to the database. One use of the frequent substructures was to convert them into probabilistic prediction rules relating compound description to carcinogenesis. These rules were found to be accurate on test data, and to give some insight into the relationship between structure and activity in carcinogenesis. The substructures were also used to prove that there existed no accurate rule, based purely on atom-bond substructure with less than seven conditions, that could predict carcinogenicity. This results put a lower bound on the complexity of the relationship between chemical structure and carcinogenicity. Only by using a data mining algorithm, and by doing a complete search, is it possible to prove such a result. Finally the frequent substructures were shown to add value by increasing the accuracy of statistical and machine learning programs that were trained to predict chemical carcinogenicity. We conclude that Warmr, and ILP data mining methods generally, are an important new tool for analysing chemical databases.
引用
收藏
页码:173 / 181
页数:8
相关论文
共 50 条
  • [1] Warmr: a data mining tool for chemical data
    King, RD
    Srinivasan, A
    Dehaspe, L
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2001, 15 (02) : 173 - 181
  • [2] A Sequential Data Preprocessing Tool for Data Mining
    Abdullah, Zailani
    Herawan, Tutut
    Chiroma, Haruna
    Deris, Mustafa Mat
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT III, 2014, 8581 : 734 - +
  • [3] An integrated data mining and data presentation tool
    Angiulli, F
    Catarci, T
    Ciaccia, P
    Ianni, G
    Kimani, S
    Lodi, S
    Patella, M
    Santucci, G
    Sartori, C
    [J]. DATA MINING III, 2002, 6 : 907 - 916
  • [4] Another data mining tool
    Mason, RL
    Young, JC
    [J]. QUALITY PROGRESS, 2003, 36 (02) : 76 - 79
  • [5] A tool for data mining support
    Hubal, M
    Bednár, P
    [J]. INTELLIGENT TECHNOLOGIES - THEORY AND APPLICATIONS: NEW TRENDS IN INTELLIGENT TECHNOLOGIES, 2002, 76 : 196 - 200
  • [6] A visual tool for mining macroeconomics data
    Giordano, D
    Maiorana, F
    [J]. DATA MINING V: DATA MINING, TEXT MINING AND THEIR BUSINESS APPLICATIONS, 2004, 10 : 241 - 251
  • [7] Enhancing the Data Mining Tool WEKA
    Kotak, Pranav
    Modi, Hiral
    [J]. PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [8] Online Interactive Data Mining Tool
    Borhade, Mahesh
    Mulay, Preeti
    [J]. BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 335 - 340
  • [9] Data Mining and Opinion Mining: A Tool in Educational Context
    Penafiel, Myriam
    Vasquez, Stefanie
    Vasquez, Diego
    Zaldumbide, Juan
    Lujan-Mora, Sergio
    [J]. ICOMS 2018: 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND STATISTICS, 2018, : 74 - 78
  • [10] Data Preparation for Data Mining in Chemical Plants using Big Data
    Borrison, Reuben
    Kloepper, Benjamin
    Mullen, Jennifer
    [J]. 2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1185 - 1191