Fast-RCM: Fast Tree-Based Unsupervised Rare-Class Mining

被引:0
|
作者
Weng, Haiqin [1 ]
Ji, Shouling [1 ,2 ,3 ]
Liu, Changchang [4 ]
Wang, Ting [5 ]
He, Qinming [1 ]
Chen, Jianhai [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, Inst Cyberspace Res, Hangzhou 310027, Peoples R China
[3] Zhejiang Univ, Alibaba Zhejiang Univ Joint Inst Frontier Technol, Hangzhou 310027, Peoples R China
[4] IBM Thomas J Watson Res Ctr, Dept Distributed AI, Yorktown Hts, NY 10598 USA
[5] Lehigh Univ, Dept Comp Sci, Bethlehem, PA 18015 USA
关键词
Anomaly detection; Diseases; Vegetation; Approximation algorithms; Time complexity; Computer science; Clustering methods; data mining; tree data structures; CATEGORY DETECTION;
D O I
10.1109/TCYB.2019.2924804
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rare classes are usually hidden in an imbalanced dataset with the majority of the data examples from major classes. Rare-class mining (RCM) aims at extracting all the data examples belonging to rare classes. Most of the existing approaches for RCM require a certain amount of labeled data examples as input. However, they are ineffective in practice since requesting label information from domain experts is time consuming and human-labor extensive. Thus, we investigate the unsupervised RCM problem, which to the best of our knowledge is the first such attempt. To this end, we propose an efficient algorithm called Fast-RCM for unsupervised RCM, which has an approximately linear time complexity with respect to data size and data dimensionality. Given an unlabeled dataset, Fast-RCM mines out the rare class by first building a rare tree for the input dataset and then extracting data examples of the rare classes based on this rare tree. Compared with the existing approaches which have quadric or even cubic time complexity, Fast-RCM is much faster and can be extended to large-scale datasets. The experimental evaluation on both synthetic and real-world datasets demonstrate that our algorithm can effectively and efficiently extract the rare classes from an unlabeled dataset under the unsupervised settings, and is approximately five times faster than that of the state-of-the-art methods.
引用
收藏
页码:5198 / 5211
页数:14
相关论文
共 50 条
  • [11] Fast stochastic exploration of tree-based file distribution architectures
    Carra, Damiano
    25TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-7, PROCEEDINGS IEEE INFOCOM 2006, 2006, : 3246 - 3247
  • [12] The Fast Contour Tree-Based Medical Volume Rendering Method
    Wang, Lei
    Guo, Quan
    Zhao, Jianqiao
    Zhang, Shengnan
    Yang, Lisu
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2018, 8 (07) : 1451 - 1455
  • [13] Fast tree-based exploration of state space for robots with dynamics
    Ladd, AM
    Kavraki, LE
    ALGORITHMIC FOUNDATIONS OF ROBOTICS VI, 2005, 17 : 297 - 312
  • [14] A Fast Tree-Based Search Algorithm for Cluster Search Engine
    Tsai, Chun-Wei
    Huang, Ko-Wei
    Chiang, Ming-Chao
    Yang, Chu-Sing
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1603 - +
  • [15] Impact of the Initialization in Tree-Based Fast Similarity Search Techniques
    Serrano, Aureo
    Mico, Luisa
    Oncina, Jose
    SIMILARITY-BASED PATTERN RECOGNITION: FIRST INTERNATIONAL WORKSHOP, SIMBAD 2011, 2011, 7005 : 163 - 176
  • [16] Fast Data Collection in Tree-Based Wireless Sensor Networks
    Incel, Ozlem Durmaz
    Ghosh, Amitabha
    Krishnamachari, Bhaskar
    Chintalapudi, Krishnakant
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2012, 11 (01) : 86 - 99
  • [17] A New Fast Minimum Spanning Tree-Based Clustering Technique
    Wang, Xiaochun
    Wang, Xia L.
    Zhu, Jihua
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 1053 - 1060
  • [18] Tree-based backoff protocol for fast RFID tag identification
    ZHENG Jia-li
    QIN Tuan-fa
    NI Guang-nan
    The Journal of China Universities of Posts and Telecommunications, 2013, (02) : 37 - 41
  • [19] Tree-based backoff protocol for fast RFID tag identification
    ZHENG Jiali
    QIN Tuanfa
    NI Guangnan
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2013, 20 (02) : 37 - 41
  • [20] A fast tree-based algorithm for Compressed Sensing with sparse-tree prior
    Bui, H. Q.
    La, C. N. H.
    Da, M. N.
    SIGNAL PROCESSING, 2015, 108 : 628 - 641