Fast-RCM: Fast Tree-Based Unsupervised Rare-Class Mining

被引:0
|
作者
Weng, Haiqin [1 ]
Ji, Shouling [1 ,2 ,3 ]
Liu, Changchang [4 ]
Wang, Ting [5 ]
He, Qinming [1 ]
Chen, Jianhai [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, Inst Cyberspace Res, Hangzhou 310027, Peoples R China
[3] Zhejiang Univ, Alibaba Zhejiang Univ Joint Inst Frontier Technol, Hangzhou 310027, Peoples R China
[4] IBM Thomas J Watson Res Ctr, Dept Distributed AI, Yorktown Hts, NY 10598 USA
[5] Lehigh Univ, Dept Comp Sci, Bethlehem, PA 18015 USA
关键词
Anomaly detection; Diseases; Vegetation; Approximation algorithms; Time complexity; Computer science; Clustering methods; data mining; tree data structures; CATEGORY DETECTION;
D O I
10.1109/TCYB.2019.2924804
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rare classes are usually hidden in an imbalanced dataset with the majority of the data examples from major classes. Rare-class mining (RCM) aims at extracting all the data examples belonging to rare classes. Most of the existing approaches for RCM require a certain amount of labeled data examples as input. However, they are ineffective in practice since requesting label information from domain experts is time consuming and human-labor extensive. Thus, we investigate the unsupervised RCM problem, which to the best of our knowledge is the first such attempt. To this end, we propose an efficient algorithm called Fast-RCM for unsupervised RCM, which has an approximately linear time complexity with respect to data size and data dimensionality. Given an unlabeled dataset, Fast-RCM mines out the rare class by first building a rare tree for the input dataset and then extracting data examples of the rare classes based on this rare tree. Compared with the existing approaches which have quadric or even cubic time complexity, Fast-RCM is much faster and can be extended to large-scale datasets. The experimental evaluation on both synthetic and real-world datasets demonstrate that our algorithm can effectively and efficiently extract the rare classes from an unlabeled dataset under the unsupervised settings, and is approximately five times faster than that of the state-of-the-art methods.
引用
收藏
页码:5198 / 5211
页数:14
相关论文
共 50 条
  • [21] Improving Rare-Class Recognition of Marine Plankton with Hard Negative Mining
    Walker, Joseph L.
    Orenstein, Eric C.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3665 - 3675
  • [22] Fast tree-based wavelet image coding with efficient use of memory
    Oliver, J
    Malumbres, MP
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2005, PTS 1-4, 2005, 5960 : 1774 - 1783
  • [23] Spanning tree-based fast community detection methods in social networks
    Basuchowdhuri P.
    Roy R.
    Anand S.
    Srivastava D.R.
    Majumder S.
    Saha S.K.
    Innovations in Systems and Software Engineering, 2015, 11 (03) : 177 - 186
  • [24] Fast and Accurate Tree-Based Clustering for Japanese/Chinese Character Recognition
    Abe, Yuichi
    Sasaki, Takahiro
    Goto, Hideaki
    IMAGE ANALYSIS AND PROCESSING (ICIAP 2013), PT II, 2013, 8157 : 459 - 468
  • [25] A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes
    Gillenwater, Jennifer
    Kulesza, Alex
    Mariet, Zelda
    Vassilvitskii, Sergei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [26] A tree-based inverted file for fast ranked-document retrieval
    Shieh, WY
    Chen, TF
    Chung, CP
    IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 64 - 69
  • [27] A fast tree-based barrier synchronization on switch-based irregular networks
    Moh, S
    Yu, C
    Youn, HY
    Han, D
    Lee, B
    Lee, D
    HIGH PERFORMANCE COMPUTING - HIPC 2000, PROCEEDINGS, 2001, 1970 : 273 - 282
  • [28] Ethernet Ultra Fast Switching: A Tree-based Local Recovery Scheme
    Su, Li
    Chen, Wentao
    Su, Haibo
    Xiao, Zhenyu
    Jin, Depeng
    Zeng, Lieguang
    2008 11TH IEEE SINGAPORE INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS (ICCS), VOLS 1-3, 2008, : 314 - 318
  • [29] Single Link Switching Mechanism for Fast Recovery in Tree-based Recovery Schemes
    Jin, Depeng
    Chen, Wentao
    Xiao, Zhenyu
    Zeng, Lieguang
    2008 INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, VOLS 1 AND 2, 2008, : 196 - 200
  • [30] Expression and lighting invariant face recognition using fast tree-based matching
    Liu, Rui
    Feng, WeiGuo
    Zhu, Ming
    ELECTRONICS LETTERS, 2013, 49 (22) : 1379 - 1381