Feasibility of Active Machine Learning for Multiclass Compound Classification

被引:30
|
作者
Lang, Tobias [1 ,2 ]
Flachsenberg, Florian [1 ]
von Luxburg, Ulrike [3 ]
Rarey, Matthias [1 ]
机构
[1] Univ Hamburg, Ctr Bioinformat, D-20146 Hamburg, Germany
[2] Univ Hamburg, Dept Comp Sci, Schluterstr 70, D-20146 Hamburg, Germany
[3] Univ Tubingen, Dept Comp Sci, D-72076 Tubingen, Germany
关键词
DISCOVERY; TOOL;
D O I
10.1021/acs.jcim.5b00332
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.
引用
收藏
页码:12 / 20
页数:9
相关论文
共 50 条
  • [1] Active learning with extreme learning machine for online imbalanced multiclass classification
    Qin, Jiongming
    Wang, Cong
    Zou, Qinhong
    Sun, Yubin
    Chen, Bin
    KNOWLEDGE-BASED SYSTEMS, 2021, 231
  • [2] A machine learning software tool for multiclass classification
    Wang, Shangzhou
    Lu, Haohui
    Khan, Arif
    Hajati, Farshid
    Khushi, Matloob
    Uddin, Shahadat
    SOFTWARE IMPACTS, 2022, 13
  • [3] Extreme Learning Machine for Regression and Multiclass Classification
    Huang, Guang-Bin
    Zhou, Hongming
    Ding, Xiaojian
    Zhang, Rui
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (02): : 513 - 529
  • [4] Active learning Rotation Forest for multiclass classification
    Kazllarof, Vangjel
    Karlos, Stamatis
    Kotsiantis, Sotiris
    COMPUTATIONAL INTELLIGENCE, 2019, 35 (04) : 891 - 918
  • [5] Efficient Multiclass Boosting Classification with Active Learning
    Huang, Jian
    Ertekin, Seyda
    Song, Yang
    Zha, Hongyuan
    Giles, C. Lee
    PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 297 - 308
  • [6] Scalable Active Learning for Multiclass Image Classification
    Joshi, Ajay J.
    Porikli, Fatih
    Papanikolopoulos, Nikolaos P.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) : 2259 - 2273
  • [7] Active Learning with Spatial Distribution based Semi-Supervised Extreme Learning Machine for Multiclass Classification
    Xu, Yuefan
    Ma, Li
    Xiao, Wendong
    2019 28TH WIRELESS AND OPTICAL COMMUNICATIONS CONFERENCE (WOCC), 2019, : 43 - 47
  • [8] Multiclass Classification of Brain Cancer with Machine Learning Algorithms
    Erkal, Begum
    Basak, Selen
    Ciloglu, Alper
    Sener, Duygu Dede
    2020 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2020,
  • [9] Multiclass Classification Machine Learning Identification of Common Poisonings
    Nogee, Daniel
    Haimovich, Adrian
    Hart, Katherine
    Tomassoni, Anthony
    CLINICAL TOXICOLOGY, 2020, 58 (11) : 1083 - 1084
  • [10] Application of Machine Learning on Brain Cancer Multiclass Classification
    Panca, V.
    Rustam, Z.
    INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES 2016 (ISCPMS 2016), 2017, 1862