DeepCKID: A Multi-Head Attention-Based Deep Neural Network Model Leveraging Classwise Knowledge to Handle Imbalanced Textual Data

被引：0

作者：

Sah, Amit Kumar ^{[1
]}

Abulaish, Muhammad ^{[1
]}

机构：

[1] South Asian Univ, Dept Comp Sci, New Delhi, India

来源：

MACHINE LEARNING WITH APPLICATIONS | 2024年 / 17卷

关键词：

Class imbalance; Text classification; Transformers; Deep learning; Multi-Head Attention; Pre-trained Language Models;

D O I：

10.1016/j.mlwa.2024.100575

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents DeepCKID, a Multi-Head Attention (MHA)-based deep learning model that exploits statistical and semantic knowledge corresponding to documents across different classes in the datasets to improve the model's ability to detect minority class instances in imbalanced text classification. In this process, corresponding to each document, DeepCKID extracts - (i) word-level statistical and semantic knowledge, namely, class correlation and class similarity corresponding to each word, based on its association with different classes in the dataset, and (ii) class-level knowledge from the document using n-grams and relation triplets corresponding to classwise keywords present, identified using cosine similarity utilizing Transformers-based Pre-trained Language Models (PLMs). DeepCKID encodes the word-level and class-level features using deep convolutional networks, which can learn meaningful patterns from them. At first, DeepCKID combines the semantically meaningful Sentence-BERT document embeddings and word-level feature matrix to give the final document representation, which it further fuses to the different classwise encoded representations to strengthen feature propagation. DeepCKID then passes the encoded document representation and its different classwise representations through an MHA layer to identify the important features at different positions of the feature subspaces, resulting in a latent dense vector accentuating its association with a particular class. Finally, DeepCKID passes the latent vector to the softmax layer to learn the corresponding class label. We evaluate DeepCKID over six publicly available Amazon reviews datasets using four Transformers-based PLMs. We compare DeepCKID with three approaches and four ablation-like baselines. Our study suggests that in most cases, DeepCKID outperforms all the comparison approaches, including baselines.

引用

页数：18

共 50 条

[1] Multi-Head Attention-Based Hybrid Deep Neural Network for Aeroengine Risk Assessment
Li, Jian-Hang
Gao, Xin-Yue
Lu, Xiang
Liu, Guo-Dong
[J]. IEEE ACCESS, 2023, 11 : 113376 - 113389
[2] An Improved Model for Analyzing Textual Sentiment Based on a Deep Neural Network Using Multi-Head Attention Mechanism
Sharaf Al-deen, Hashem Saleh
Zeng, Zhiwen
Al-sabri, Raeed
Hekmat, Arash
[J]. APPLIED SYSTEM INNOVATION, 2021, 4 (04)
[3] Data-driven fiber model based on the deep neural network with multi-head attention mechanism
Zang, Yubin
Yu, Zhenming
Xu, Kun
Chen, Minghua
Yang, Sigang
Chen, Hongwei
[J]. OPTICS EXPRESS, 2022, 30 (26) : 46626 - 46648
[4] Multi-head attention-based model for reconstructing continuous missing time series data
Wu, Huafeng
Zhang, Yuxuan
Liang, Linian
Mei, Xiaojun
Han, Dezhi
Han, Bing
Weng, Tien-Hsiung
Li, Kuan-Ching
[J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (18): : 20684 - 20711
[5] Multi-head attention-based model for reconstructing continuous missing time series data
Huafeng Wu
Yuxuan Zhang
Linian Liang
Xiaojun Mei
Dezhi Han
Bing Han
Tien-Hsiung Weng
Kuan-Ching Li
[J]. The Journal of Supercomputing, 2023, 79 : 20684 - 20711
[6] A Reverse Positional Encoding Multi-Head Attention-Based Neural Machine Translation Model for Arabic Dialects
Baniata, Laith H.
Kang, Sangwoo
Ampomah, Isaac K. E.
[J]. MATHEMATICS, 2022, 10 (19)
[7] Enhancing Recommendation Capabilities Using Multi-Head Attention-Based Federated Knowledge Distillation
Wu, Aming
Kwon, Young-Woo
[J]. IEEE ACCESS, 2023, 11 : 45850 - 45861
[8] Self Multi-Head Attention-based Convolutional Neural Networks for fake news detection
Fang, Yong
Gao, Jian
Huang, Cheng
Peng, Hua
Wu, Runpu
[J]. PLOS ONE, 2019, 14 (09):
[9] A Novel Knowledge Tracing Model Based on Collaborative Multi-Head Attention
Zhang Wei
Qu Kaiyuan
Han Yahui
Tan Longan
[J]. 6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 210 - 215
[10] Multiscaled Multi-Head Attention-Based Video Transformer Network for Hand Gesture Recognition
Garg, Mallika
Ghosh, Debashis
Pradhan, Pyari Mohan
[J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 80 - 84

← 1 2 3 4 5 →