Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning †

被引:2
|
作者
Yelmen, Ilkay [1 ,2 ]
Gunes, Ali [1 ]
Zontul, Metin [3 ]
机构
[1] Istanbul Aydin Univ, Fac Engn, Dept Comp Engn, TR-34295 Istanbul, Turkiye
[2] Turkcell Grp Co Digital Educ Technol Inc, TR-06800 Ankara, Turkiye
[3] Sivas Sci & Technol Univ, Fac Engn & Nat Sci, Dept Comp Engn, TR-58100 Sivas, Turkiye
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 10期
关键词
document classification; multi-class classification; word embeddings; WordNet; BERT; TEXT CLASSIFICATION;
D O I
10.3390/app13106139
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With the recent growth of the Internet, the volume of data has also increased. In particular, the increase in the amount of unstructured data makes it difficult to manage data. Classification is also needed in order to be able to use the data for various purposes. Since it is difficult to manually classify the ever-increasing volume data for the purpose of various types of analysis and evaluation, automatic classification methods are needed. In addition, the performance of imbalanced and multi-class classification is a challenging task. As the number of classes increases, so does the number of decision boundaries a learning algorithm has to solve. Therefore, in this paper, an improvement model is proposed using WordNet lexical ontology and BERT to perform deeper learning on the features of text, thereby improving the classification effect of the model. It was observed that classification success increased when using WordNet 11 general lexicographer files based on synthesis sets, syntactic categories, and logical groupings. WordNet was used for feature dimension reduction. In experimental studies, word embedding methods were used without dimension reduction. Afterwards, Random Forest (RF), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) algorithms were employed to perform classification. These studies were then repeated with dimension reduction performed by WordNet. In addition to the machine learning model, experiments were also conducted with the pretrained BERT model with and without WordNet. The experimental results showed that, on an unstructured, seven-class, imbalanced dataset, the highest accuracy value of 93.77% was obtained when using our proposed model.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Deep Learning–Based Skin Lesion Multi-class Classification with Global Average Pooling Improvement
    Paravatham V. S. P. Raghavendra
    C. Charitha
    K. Ghousiya Begum
    V. B. S. Prasath
    Journal of Digital Imaging, 2023, 36 (5) : 2227 - 2248
  • [32] Large-Scale Multi-Class Image-Based Cell Classification With Deep Learning
    Meng, Nan
    Lam, Edmund Y.
    Tsia, Kevin K.
    So, Hayden Kwok-Hay
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (05) : 2091 - 2098
  • [33] Multi-Class Confidence Detection Using Deep Learning Approach
    Mujahid, Amna
    Aslam, Muhammad
    Khan, Muhammad Usman Ghani
    Martinez-Enriquez, Ana Maria
    Ul Haq, Nazeef
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [34] Enabling Ontology-based Document Classification and Management in ebXML Registries
    Bechini, Alessio
    Tomasi, Andrea
    Viotto, Jacopo
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1145 - 1150
  • [35] Ontology-based enriched concept graphs for medical document classification
    Shanavas, Niloofer
    Wang, Hui
    Lin, Zhiwei
    Hawe, Glenn
    INFORMATION SCIENCES, 2020, 525 : 172 - 181
  • [36] Multi-Class Prediction of Mineral Resources Based on Deep Learning
    Ding, Liang
    Zhu, Yuelong
    Zhang, Pengcheng
    Dong, Hai
    Chen, Hao
    IEEE ACCESS, 2022, 10 : 111463 - 111476
  • [37] A Deep Transfer Learning Framework for the Multi-Class Classification of Vector Mosquito Species
    Pise, Reshma
    Patil, Kailas
    JOURNAL OF ECOLOGICAL ENGINEERING, 2023, 24 (09): : 183 - 191
  • [38] An active learning algorithm for multi-class classification
    Liu, Dongjiang
    Liu, Yanbi
    PATTERN ANALYSIS AND APPLICATIONS, 2019, 22 (03) : 1051 - 1063
  • [39] Multi-class classification in nonparametric active learning
    Njike, Boris Ndjia
    Siebert, Xavier
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [40] Deep Learning Framework for Multi-class Breast Cancer Histology Image Classification
    Vang, Yeeleng S.
    Chen, Zhen
    Xie, Xiaohui
    IMAGE ANALYSIS AND RECOGNITION (ICIAR 2018), 2018, 10882 : 914 - 922