Fast-RCM: Fast Tree-Based Unsupervised Rare-Class Mining

被引:0
|
作者
Weng, Haiqin [1 ]
Ji, Shouling [1 ,2 ,3 ]
Liu, Changchang [4 ]
Wang, Ting [5 ]
He, Qinming [1 ]
Chen, Jianhai [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, Inst Cyberspace Res, Hangzhou 310027, Peoples R China
[3] Zhejiang Univ, Alibaba Zhejiang Univ Joint Inst Frontier Technol, Hangzhou 310027, Peoples R China
[4] IBM Thomas J Watson Res Ctr, Dept Distributed AI, Yorktown Hts, NY 10598 USA
[5] Lehigh Univ, Dept Comp Sci, Bethlehem, PA 18015 USA
关键词
Anomaly detection; Diseases; Vegetation; Approximation algorithms; Time complexity; Computer science; Clustering methods; data mining; tree data structures; CATEGORY DETECTION;
D O I
10.1109/TCYB.2019.2924804
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rare classes are usually hidden in an imbalanced dataset with the majority of the data examples from major classes. Rare-class mining (RCM) aims at extracting all the data examples belonging to rare classes. Most of the existing approaches for RCM require a certain amount of labeled data examples as input. However, they are ineffective in practice since requesting label information from domain experts is time consuming and human-labor extensive. Thus, we investigate the unsupervised RCM problem, which to the best of our knowledge is the first such attempt. To this end, we propose an efficient algorithm called Fast-RCM for unsupervised RCM, which has an approximately linear time complexity with respect to data size and data dimensionality. Given an unlabeled dataset, Fast-RCM mines out the rare class by first building a rare tree for the input dataset and then extracting data examples of the rare classes based on this rare tree. Compared with the existing approaches which have quadric or even cubic time complexity, Fast-RCM is much faster and can be extended to large-scale datasets. The experimental evaluation on both synthetic and real-world datasets demonstrate that our algorithm can effectively and efficiently extract the rare classes from an unlabeled dataset under the unsupervised settings, and is approximately five times faster than that of the state-of-the-art methods.
引用
收藏
页码:5198 / 5211
页数:14
相关论文
共 50 条
  • [41] Differentially private tree-based redescription mining
    Mihelcic, Matej
    Miettinen, Pauli
    DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (04) : 1548 - 1590
  • [42] Study on A Fast Algorithm for Mining Disorder Tree
    Guo, Xin
    PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 194 - 197
  • [43] Fast mining maximal frequent itemsets; Based on FP-tree
    Yan, YJ
    Li, ZJ
    Chen, HW
    CONCEPTUAL MODELING - ER 2004, PROCEEDINGS, 2004, 3288 : 348 - 361
  • [44] An optimal pruned traversal tree-based fast minimum cut solver in dense graph
    Wei, Wei
    Liu, Yuting
    Zhang, Qinghui
    INFORMATION SCIENCES, 2024, 652
  • [45] TRQED: Secure and Fast Tree-Based Private Range Queries over Encrypted Cloud
    Yang, Wei
    Xu, Yang
    Nie, Yiwen
    Shen, Yao
    Huang, Liusheng
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2018), PT II, 2018, 10828 : 130 - 146
  • [46] Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks
    Ziegler, Tobias
    Vani, Sumukha Tumkur
    Binnig, Carsten
    Fonseca, Rodrigo
    Kraska, Tim
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 741 - 758
  • [47] STREAMRHF: Tree-Based Unsupervised Anomaly Detection for Data Streams
    Nesic, Stefan
    Putina, Andrian
    Bahri, Maroua
    Huet, Alexis
    Navarro, Jose Manuel
    Rossi, Dario
    Sozio, Mauro
    2022 IEEE/ACS 19TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2022,
  • [48] A fast algorithm to identify coevolutionary patterns from protein sequences based on tree-based data structure
    Hu, Lun
    Yang, Shicheng
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 2273 - 2278
  • [49] An Efficient Tree-based Fuzzy Data Mining Approach
    Lin, Chun-Wei
    Hong, Tzung-Pei
    Lu, Wen-Hsiang
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2010, 12 (02) : 150 - 157
  • [50] Performance Analysis of Tree-Based Approaches for Pattern Mining
    Borah, Anindita
    Nath, Bhabesh
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, 2019, 711 : 435 - 448