Multi-class Document Classification Using Improved Word Embeddings

被引:2
|
作者
Rabut, Benedict A. [1 ]
Fajardo, Arnel C. [2 ]
Medina, Ruji P. [1 ]
机构
[1] Technol Inst Philippines, Coll Informat Technol Educ, Quezon City, Philippines
[2] Manuel L Quezon Univ, Sch Grad Studies, Quezon City, Philippines
关键词
Natural Language Processing; Document Classification; Word Embeddings;
D O I
10.1145/3366650.3366661
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we conducted an experiment to build a classification model that combines different techniques in most of the Natural Language Processing Tasks. We used the word embedding method to transform every word in the dataset and to obtain the custom-built word embedding vectors. This is in contrast to the approaches in the previous literature that implement word embedding using the pre-trained word embedding vectors. We enriched the custom-built word embedding vectors by incorporating Part-of-Speech (POS) tag vectors to provide additional semantic information about the word to be used in training our proposed classification model. The proposed model was built using the neural network approach, which is considered to be more efficient and reliable in solving real problems for document classification tasks. We fine-tuned the parameters during the training of our neural network classification model with our aim to increase the performance in terms of classification accuracy. The experimental result demonstrates that our model performs remarkably well and increase the percentage accuracy up to 1.7% compared to the accuracy results obtained by the previous baseline word embedding methods using the same dataset. It was also observed that our model outperforms some other traditional classification models implemented using different techniques and machine learning algorithms.
引用
下载
收藏
页码:42 / 46
页数:5
相关论文
共 50 条
  • [1] Unsupervised Multi-Label Document Classification for Large Taxonomies Using Word Embeddings
    Hirschmeier, Stefan
    Melsbach, Johannes
    Schoder, Detlef
    Stahlmann, Sven
    2019 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2019), 2019, : 1287 - 1293
  • [2] Multi-Class Document Image Classification using Deep Visual and Textual Features
    Sevim, Semih
    Ekinci, Ekin
    Omurca, Sevinc Ilhan
    Edinc, Eren Berk
    Eken, Suleyman
    Erdem, Turkucan
    Sayar, Ahmet
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2022, 21 (02)
  • [3] Multi-class Classification Using an Improved Multiobjective Simultaneous Learning Framework
    Bharill, Neha
    Tiwari, Aruna
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2011), VOL 2, 2012, 131 : 821 - 831
  • [4] A Supervised Multi-class Multi-labelWord Embeddings Approach for Toxic Comment Classification
    Carta, Salvatore
    Corriga, Andrea
    Mulas, Riccardo
    Recupero, Diego
    Saia, Roberto
    KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, : 105 - 112
  • [5] Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning †
    Yelmen, Ilkay
    Gunes, Ali
    Zontul, Metin
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [6] Combining Word Embeddings with Taxonomy Information for Multi-Label Document Classification
    Hirschmeier, Stefan
    Schoder, Detlef
    DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,
  • [7] Improved Margin Multi-Class Classification using Dendritic Neurons with Morphological Learning
    Hussain, Shaista
    Liu, Shih-Chii
    Basu, Arindam
    2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2014, : 2640 - 2643
  • [8] Evaluation of two systems on multi-class multi-label document classification
    Luo, X
    Zincir-Heywood, AN
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, 3488 : 161 - 169
  • [9] Steganographic domain classification using multi-class
    Xu Bo
    Wang Jiazhen
    Liu Xiaqin
    Yang Sumin
    ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 1270 - 1273
  • [10] Detecting steganography using multi-class classification
    Rodriguez, Benjamin
    Peterson, Gilbert
    ADVANCES IN DIGITAL FORENSIC III, 2007, 242 : 193 - +