A Combined Approach for Multi-Label Text Data Classification

被引:1
|
作者
Strimaitis, Rokas [1 ]
Stefanovic, Pavel [1 ]
Ramanauskaite, Simona [2 ]
Slotkiene, Asta [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Syst, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
[2] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
关键词
Analysis solution - Automated data analysis - Data classification - Data items - Multi-labels - Multilabel - Multinomial naive bayes - Similarity measure - Text analysis - Text data;
D O I
10.1155/2022/3369703
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automated data analysis solutions are very dependent on data and its quality. The possibility of assigning more than one class to the same data item is one of the specificities that need to be taken into account. There are no solutions, dedicated to Lithuanian text data classification that helps to assign more than one class to data item. In this paper, a new combined approach has been proposed for multilabel text data classification for text analysis. The main aim of the proposed approach is to improve the accuracy of traditional classification algorithms by incorporating the results obtained using similarity measures. The experimental investigation has been performed using the financial news multilabel text data in the Lithuanian language. Data have been collected from four public websites and classified by experts into ten classes manually, where each of the data items has no more than two classes. The results of five commonly used algorithms have been compared for dataset classification: the support vector machine, multinomial naive Bayes, k-nearest neighbours, decision trees, linear and discriminant analysis. In addition, two similarity measures have been compared: the cosine distance and the dice coefficient. Research has shown that the best results have been obtained using the cosine similarity distance and the multinomial naive Bayes classifier. The proposed approach combines the results of these two methods. Research on different cases of the proposed approach indicated the peculiarities of its application. At the same time, the combined approach allowed us to obtain a statistically significant increase in global accuracy.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach
    Chai, Yuyang
    Li, Zhuang
    Liu, Jiahui
    Chen, Lei
    Li, Fei
    Ji, Donghong
    Teng, Chong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17727 - 17735
  • [2] Label prompt for multi-label text classification
    Song, Rui
    Liu, Zelong
    Chen, Xingbing
    An, Haining
    Zhang, Zhiqi
    Wang, Xiaoguang
    Xu, Hao
    APPLIED INTELLIGENCE, 2023, 53 (08) : 8761 - 8775
  • [3] Label prompt for multi-label text classification
    Rui Song
    Zelong Liu
    Xingbing Chen
    Haining An
    Zhiqi Zhang
    Xiaoguang Wang
    Hao Xu
    Applied Intelligence, 2023, 53 : 8761 - 8775
  • [4] A Multi-label Classification Approach for ICT Fault Text Analysis
    Zhang, Qiang
    Chen, Xiaona
    2019 12TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2019), 2019, : 241 - 244
  • [5] A virtual multi-label approach to imbalanced data classification
    Chou, Elizabeth P.
    Yang, Shan-Ping
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (03) : 1461 - 1471
  • [6] LABEL-AWARE TEXT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Guo, Hao
    Li, Xiangyang
    Zhang, Lei
    Liu, Jia
    Chen, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7728 - 7732
  • [7] Metalearning Applied to Multi-label Text Classification
    dos Santos, Vania Batista
    de Campos Merschmann, Luiz Henrique
    PROCEEDINGS OF 16TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS ON DIGITAL TRANSFORMATION AND INNOVATION, SBSI 2020, 2020,
  • [8] Scalable Multi-Label Arabic Text Classification
    Ahmed, Nizar A.
    Shehab, Mohammed A.
    Al-Ayyoub, Mahmoud
    Hmeidi, Ismail
    2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2015, : 212 - 217
  • [9] Image to Text Translation by Multi-Label Classification
    Nasierding, Gulisong
    Kouzani, Abbas Z.
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2010, 6216 : 247 - +
  • [10] A Neural Architecture for Multi-label Text Classification
    Coope, Sam
    Bachrach, Yoram
    Zukov-Gregoric, Andrej
    Rodriguez, Jose
    Maksak, Bogdan
    McMurtie, Conan
    Bordbar, Mahyar
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 676 - 691