Improving large-scale hierarchical classification by rewiring: a data-driven filter based approach

被引:0
|
作者
Azad Naik
Huzefa Rangwala
机构
[1] Microsoft Corporation,
[2] George Mason University,undefined
关键词
Top-down hierarchical classification; Inconsistency; Error propagation; Flattening; Clustering; Rewiring;
D O I
暂无
中图分类号
学科分类号
摘要
Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down hierarchical classification with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i.e., defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art hierarchical classification approaches. Source Code: https://cs.gmu.edu/~mlbio/TaxMod/
引用
收藏
页码:141 / 164
页数:23
相关论文
共 50 条
  • [31] A large-scale disturbance mapping ensemble through data-driven regionalization
    Bueno, Inacio Thomaz
    Hird, Jennifer
    McDermid, Gregory John
    Galvao, Lenio Soares
    Acerbi Junior, Fausto Weimar
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (12) : 3700 - 3725
  • [32] An empirical study of large-scale data-driven full waveform inversion
    Jin, Peng
    Feng, Yinan
    Feng, Shihang
    Wang, Hanchen
    Chen, Yinpeng
    Consolvo, Benjamin
    Liu, Zicheng
    Lin, Youzuo
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [33] Natiolectal Variation in Dutch Morphosyntax: A Large-Scale, Data-Driven Perspective
    De Troij, Robbert
    Grondelaers, Stefan
    Speelman, Dirk
    JOURNAL OF GERMANIC LINGUISTICS, 2023, 35 (01) : 1 - 68
  • [34] PGD: A Large-scale Professional Go Dataset for Data-driven Analytics
    Gao, Yifan
    arXiv, 2022,
  • [35] mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
    Antonio Mirarchi
    Toni Giorgino
    Gianni De Fabritiis
    Scientific Data, 11 (1)
  • [36] Domain Decomposition for Data-Driven Reduced Modeling of Large-Scale Systems
    Farcas, Ionut-Gabriel
    Gundevia, Rayomand P.
    Munipalli, Ramakanth
    Willcox, Karen E.
    AIAA JOURNAL, 2024, : 4071 - 4086
  • [37] Implementing Large-Scale Data-Driven Quality Improvement in Assisted Living
    Ramly, Edmond
    Parks, Reid
    Fishler, Theresa
    Ford, James H.
    Zimmerman, David
    Nordman-Oliveira, Susan
    JOURNAL OF THE AMERICAN MEDICAL DIRECTORS ASSOCIATION, 2022, 23 (02) : 280 - 287
  • [38] Sparse data-driven wavefront prediction for large-scale adaptive optics
    Cerqueira, Paulo
    Piscaer, Pieter
    Verhaegen, Michel
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2021, 38 (07) : 992 - 1002
  • [39] Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset
    Zhang, Cong
    Kang, Kai
    Li, Hongsheng
    Wang, Xiaogang
    Xie, Rong
    Yang, Xiaokang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1048 - 1061
  • [40] Introduction to the special issue on data-driven and large-scale distributed simulations
    Cai, W.
    Aydt, H.
    JOURNAL OF SIMULATION, 2017, 11 (03) : 193 - 193