A Combined Approach for Multi-Label Text Data Classification

被引:1
|
作者
Strimaitis, Rokas [1 ]
Stefanovic, Pavel [1 ]
Ramanauskaite, Simona [2 ]
Slotkiene, Asta [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Syst, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
[2] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
关键词
Analysis solution - Automated data analysis - Data classification - Data items - Multi-labels - Multilabel - Multinomial naive bayes - Similarity measure - Text analysis - Text data;
D O I
10.1155/2022/3369703
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automated data analysis solutions are very dependent on data and its quality. The possibility of assigning more than one class to the same data item is one of the specificities that need to be taken into account. There are no solutions, dedicated to Lithuanian text data classification that helps to assign more than one class to data item. In this paper, a new combined approach has been proposed for multilabel text data classification for text analysis. The main aim of the proposed approach is to improve the accuracy of traditional classification algorithms by incorporating the results obtained using similarity measures. The experimental investigation has been performed using the financial news multilabel text data in the Lithuanian language. Data have been collected from four public websites and classified by experts into ten classes manually, where each of the data items has no more than two classes. The results of five commonly used algorithms have been compared for dataset classification: the support vector machine, multinomial naive Bayes, k-nearest neighbours, decision trees, linear and discriminant analysis. In addition, two similarity measures have been compared: the cosine distance and the dice coefficient. Research has shown that the best results have been obtained using the cosine similarity distance and the multinomial naive Bayes classifier. The proposed approach combines the results of these two methods. Research on different cases of the proposed approach indicated the peculiarities of its application. At the same time, the combined approach allowed us to obtain a statistically significant increase in global accuracy.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Research on Multi-Classification and Multi-Label in Text Categorization
    Hua, Liu
    2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 86 - 89
  • [22] A lightweight filter based feature selection approach for multi-label text classification
    Dhal P.
    Azad C.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (09) : 12345 - 12357
  • [23] Multi-Label Emotion Classification on Code-Mixed Text: Data and Methods
    Ameer, Iqra
    Sidorov, Grigori
    Gomez-Adorno, Helena
    Nawab, Rao Muhammad Adeel
    IEEE ACCESS, 2022, 10 : 8779 - 8789
  • [24] EnvBERT: Multi-label Text Classification for Imbalanced, Noisy Environmental News Data
    Kim, Dohyung
    Koo, Jahwan
    Kim, Ung-Mo
    PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [25] A COPRAS-based Approach to Multi-Label Feature Selection for Text Classification
    Mohanrasu, S. S.
    Janani, K.
    Rakkiyappan, R.
    MATHEMATICS AND COMPUTERS IN SIMULATION, 2024, 222 : 3 - 23
  • [26] Multi-label Text Classification Approach for Sentence Level News Emotion Analysis
    Bhowmick, Plaban Kr.
    Basu, Anupam
    Mitra, Pabitra
    Prasad, Abhishek
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 261 - 266
  • [27] ADAM: An Attentional Data Augmentation Method for Extreme Multi-label Text Classification
    Zhang, Jiaxin
    Liu, Jie
    Chen, Shaowei
    Lin, Shaoxin
    Wang, Bingquan
    Wang, Shanpeng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 131 - 142
  • [28] Using Reddit Data for Multi-label Text Classification of Twitter Users Interests
    Fiallos, Angel
    Jimenes, Karina
    2019 SIXTH INTERNATIONAL CONFERENCE ON EDEMOCRACY & EGOVERNMENT (ICEDEG), 2019, : 324 - 327
  • [29] Using Correlation Based Subspace Clustering For Multi-label Text Data Classification
    Ahmed, Mohammad Salim
    Khan, Latifur
    Rajeswari, Mandava
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 296 - 303
  • [30] Multi-label text classification with an ensemble feature space
    Tandon, Kushagri
    Chatterjee, Niladri
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4425 - 4436