Hierarchical classification of data streams: a systematic literature review

被引:9
|
作者
Tieppo, Eduardo [1 ,2 ]
dos Santos, Roger Robson [2 ]
Barddal, Jean Paul [2 ]
Nievola, Julio Cesar [2 ]
机构
[1] Inst Fed Parana IFPR, Campus Pinhais, Pinhais, Brazil
[2] Pontificia Univ Catolica Parana PUCPR, Posgrad Informat PPGIa, Curitiba, Parana, Brazil
关键词
Data stream mining; Hierarchical classification; Systematic literature review; Machine learning; ACTIVITY RECOGNITION; OBJECT RECOGNITION; CLASSIFIERS; MACHINE; REPRESENTATION; PERFORMANCE; ALGORITHM; AGREEMENT; QUALITY; DRIFT;
D O I
10.1007/s10462-021-10087-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The classification task usually works with flat and batch learners, assuming problems as stationary and without relations between class labels. Nevertheless, several real-world problems do not assume these premises, i.e., data have labels organized hierarchically and are made available in streaming fashion, meaning that their behavior can drift over time. Existing studies on hierarchical classification do not consider data streams as input of their process, and thus, data is assumed as stationary and handled through batch learners. The same can be said about works on streaming data, as the hierarchical classification is overlooked. Studies concerning each area individually are promising, yet, do not tackle their intersection. This study analyzes the main characteristics of the state-of-the-art works on hierarchical classification for streaming data concerning five aspects: (i) problems tackled, (ii) datasets, (iii) algorithms, (iv) evaluation metrics, and (v) research gaps in the area. We performed a systematic literature review of primary studies and retrieved 3,722 papers, of which 42 were identified as relevant and used to answer the aforementioned research questions. We found that the problems handled by hierarchical classification of data streams include mainly classification of images, human activities, texts, and audio; the datasets are mostly created or synthetic data; the algorithms and evaluation metrics are well-known techniques or based on those; and research gaps are related to dynamic context, data complexity, and computational resources constraints. We also provide implications for future research and experiments to consider common characteristics shared amongst hierarchical classification and data stream classification.
引用
收藏
页码:3243 / 3282
页数:40
相关论文
共 50 条
  • [31] Data Market Design: A Systematic Literature Review
    Driessen, Stefan W.
    Monsieur, Geert
    Van den Heuvel, Willem-Jan
    IEEE ACCESS, 2022, 10 : 33123 - 33153
  • [32] On String Classification in Data Streams
    Aggarwal, Charu C.
    Yu, Philip S.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 36 - 45
  • [33] Preserving Data Journalism: A Systematic Literature Review
    Heravi, Bahareh
    Cassidy, Kathryn
    Davis, Edie
    Harrower, Natalie
    JOURNALISM PRACTICE, 2022, 16 (10) : 2083 - 2105
  • [34] A systematic literature review of data literacy education
    Ghodoosi, Bahareh
    West, Tracey
    Li, Qinyi
    Torrisi-Steele, Geraldine
    Dey, Sharmistha
    JOURNAL OF BUSINESS & FINANCE LIBRARIANSHIP, 2023, 28 (02) : 112 - 127
  • [35] SYSTEMATIC LITERATURE REVIEW FOR UTILITY DATA IN HAEMOPHILIA
    Bartlett, C.
    Miller, P.
    Bracewell, J.
    McCool, R.
    Jasso-Mosqueda, J. G.
    Bozzi, S.
    VALUE IN HEALTH, 2023, 26 (12) : S161 - S162
  • [36] Data fusion for ITS: A systematic literature review
    Ounoughi, Chahinez
    Ben Yahia, Sadok
    INFORMATION FUSION, 2023, 89 : 267 - 291
  • [37] A systematic literature review of data governance and cloud data governance
    Al-Ruithe, Majid
    Benkhelifa, Elhadj
    Hameed, Khawar
    PERSONAL AND UBIQUITOUS COMPUTING, 2019, 23 (5-6) : 839 - 859
  • [38] A systematic literature review of data governance and cloud data governance
    Majid Al-Ruithe
    Elhadj Benkhelifa
    Khawar Hameed
    Personal and Ubiquitous Computing, 2019, 23 : 839 - 859
  • [39] BIG DATA ARCHITECTURES FOR DATA LAKES: A SYSTEMATIC LITERATURE REVIEW
    Ramchand, Sonam
    Mahmood, Tariq
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1141 - 1146
  • [40] A Systematic Review of Density Grid-Based Clustering for Data Streams
    Tareq, Mustafa
    Sundararajan, Elankovan A.
    Harwood, Aaron
    Abu Bakar, Azuraliza
    IEEE ACCESS, 2022, 10 : 579 - 596