Improving large-scale hierarchical classification by rewiring: a data-driven filter based approach

被引:0
|
作者
Azad Naik
Huzefa Rangwala
机构
[1] Microsoft Corporation,
[2] George Mason University,undefined
关键词
Top-down hierarchical classification; Inconsistency; Error propagation; Flattening; Clustering; Rewiring;
D O I
暂无
中图分类号
学科分类号
摘要
Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down hierarchical classification with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i.e., defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art hierarchical classification approaches. Source Code: https://cs.gmu.edu/~mlbio/TaxMod/
引用
收藏
页码:141 / 164
页数:23
相关论文
共 50 条
  • [1] Improving large-scale hierarchical classification by rewiring: a data-driven filter based approach
    Naik, Azad
    Rangwala, Huzefa
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2019, 52 (01) : 141 - 164
  • [2] Integrated Framework for Improving Large-scale Hierarchical Classification
    Naik, Azad
    Rangwala, Huzefa
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 281 - 288
  • [3] A Data-driven Mechanism for Large-scale Data Distribution
    Shi Peichang
    Li Yiying
    Ding Bo
    Jiang Longquan
    Liu Hui
    Zhang Jie
    2016 WORLD AUTOMATION CONGRESS (WAC), 2016,
  • [4] Data-driven Authoring of Large-scale Ecosystems
    Kapp, Konrad
    Gain, James
    Guerin, Eric
    Galin, Eric
    Peytavie, Adrien
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
  • [5] Large-scale Data-driven Segmentation of Banking Customers
    Hossain, Md Monir
    Sebestyen, Mark
    Mayank, Dhruv
    Ardakanian, Omid
    Khazaei, Hamzeh
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4392 - 4401
  • [6] Data-driven realistic animation of large-scale forest
    School of Computer Science, Wuhan University, Wuhan 430079, China
    不详
    不详
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2008, 20 (08): : 1015 - 1022
  • [7] Large-scale mode identification and data-driven sciences
    Mukhopadhyay, Subhadeep
    ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (01): : 215 - 240
  • [8] Hierarchical power control of a large-scale wind farm by using a data-driven optimization method
    Di, Pengyu
    Xiao, Xiaoqing
    Pan, Feng
    Yang, Yuyao
    Zhang, Xiaoshun
    PLOS ONE, 2023, 18 (09):
  • [9] A data-driven fault detection approach for unknown large-scale systems based on GA-SVM
    Ma, Zhenlei
    Li, Xiaojian
    Sun, Jie
    INFORMATION SCIENCES, 2024, 658
  • [10] Toward Large-Scale Graph-Based Traffic Forecasting: A Data-Driven Network Partitioning Approach
    Zhang, Chenhan
    Zhang, Shuyu
    Zou, Xiexin
    Yu, Shui
    Yu, James J. Q.
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (05) : 4506 - 4519