Hierarchical classification of data streams: a systematic literature review

被引:9
|
作者
Tieppo, Eduardo [1 ,2 ]
dos Santos, Roger Robson [2 ]
Barddal, Jean Paul [2 ]
Nievola, Julio Cesar [2 ]
机构
[1] Inst Fed Parana IFPR, Campus Pinhais, Pinhais, Brazil
[2] Pontificia Univ Catolica Parana PUCPR, Posgrad Informat PPGIa, Curitiba, Parana, Brazil
关键词
Data stream mining; Hierarchical classification; Systematic literature review; Machine learning; ACTIVITY RECOGNITION; OBJECT RECOGNITION; CLASSIFIERS; MACHINE; REPRESENTATION; PERFORMANCE; ALGORITHM; AGREEMENT; QUALITY; DRIFT;
D O I
10.1007/s10462-021-10087-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The classification task usually works with flat and batch learners, assuming problems as stationary and without relations between class labels. Nevertheless, several real-world problems do not assume these premises, i.e., data have labels organized hierarchically and are made available in streaming fashion, meaning that their behavior can drift over time. Existing studies on hierarchical classification do not consider data streams as input of their process, and thus, data is assumed as stationary and handled through batch learners. The same can be said about works on streaming data, as the hierarchical classification is overlooked. Studies concerning each area individually are promising, yet, do not tackle their intersection. This study analyzes the main characteristics of the state-of-the-art works on hierarchical classification for streaming data concerning five aspects: (i) problems tackled, (ii) datasets, (iii) algorithms, (iv) evaluation metrics, and (v) research gaps in the area. We performed a systematic literature review of primary studies and retrieved 3,722 papers, of which 42 were identified as relevant and used to answer the aforementioned research questions. We found that the problems handled by hierarchical classification of data streams include mainly classification of images, human activities, texts, and audio; the datasets are mostly created or synthetic data; the algorithms and evaluation metrics are well-known techniques or based on those; and research gaps are related to dynamic context, data complexity, and computational resources constraints. We also provide implications for future research and experiments to consider common characteristics shared amongst hierarchical classification and data stream classification.
引用
收藏
页码:3243 / 3282
页数:40
相关论文
共 50 条
  • [1] Hierarchical classification of data streams: a systematic literature review
    Eduardo Tieppo
    Roger Robson dos Santos
    Jean Paul Barddal
    Júlio Cesar Nievola
    Artificial Intelligence Review, 2022, 55 : 3243 - 3282
  • [2] Cleaning Big Data Streams: A Systematic Literature Review
    Alotaibi, Obaid
    Pardede, Eric
    Tomy, Sarath
    Bagui, Sikha
    Iacono, Mauro
    TECHNOLOGIES, 2023, 11 (04)
  • [3] A Systematic Literature Review of Novelty Detection in Data Streams: Challenges and Opportunities
    Gaudreault, Jean-Gabriel
    Branco, Paula
    ACM COMPUTING SURVEYS, 2024, 56 (10)
  • [4] State of the art on quality control for data streams: A systematic literature review
    Mirzaie, Mostafa
    Behkamal, Behshid
    Allahbakhsh, Mohammad
    Paydar, Samad
    Bertino, Elisa
    COMPUTER SCIENCE REVIEW, 2023, 48
  • [5] Data preprocessing for heart disease classification: A systematic literature review
    Benhar, H.
    Idri, A.
    Fernandez-Aleman, J. L.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2020, 195
  • [6] Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review
    Ghani, Nur Laila Ab
    Aziz, Izzatdin Abdul
    AbdulKadir, Said Jadid
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4649 - 4668
  • [7] Medical Data Classification Using Genetic Programming: A Systematic Literature Review
    Maurya, Pratibha
    Kushwaha, Arati
    Prakash, Om
    EXPERT SYSTEMS, 2025, 42 (03)
  • [8] A systematic literature review on the stochastic analysis of value streams
    Luz, Gabriel Preuss
    Tortorella, Guilherme Luz
    Narayanamurthy, Gopalakrishnan
    Gaiardelli, Paolo
    Sawhney, Rapinder
    PRODUCTION PLANNING & CONTROL, 2021, 32 (02) : 121 - 131
  • [9] Data pre-processing for cardiovascular disease classification: A systematic literature review
    Javid, Irfan
    Ghazali, Rozaida
    Zulqarnain, Muhammad
    Hassan, Norlida
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (01) : 1525 - 1545
  • [10] Phishing Classification Techniques: A Systematic Literature Review
    Abdillah, Rahmad
    Shukur, Zarina
    Mohd, Masnizah
    Murah, Ts Mohd Zamri
    IEEE ACCESS, 2022, 10 : 41574 - 41591