Text classification framework for short text based on TFIDF-FastText

被引:6
|
作者
Chawla, Shrutika [1 ]
Kaur, Ravreet [1 ]
Aggarwal, Preeti [1 ]
机构
[1] Panjab Univ, Univ Inst Engn & Technol UIET, CSE Dept, Chandigarh, India
关键词
Text classification; TFIDF; FastText; LGBM; Short text similarity; Paraphrasing;
D O I
10.1007/s11042-023-15211-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text classification is an issue of high priority in text mining, information retrieval that needs to address the problem of capturing the semantic information of the text. However, several approaches are used to detect the similarity in short sentences, most of these miss the semantic information. This paper introduces a hybrid framework to classify semantically similar short texts from a given set of documents. A real-life dataset - Quora Question Pairs is used for this purpose. In the proposed framework, the question pairs of short texts are pre-processed to eliminate junk information and 25 tokens, and string-equivalence features are engineered from the dataset, which plays a major role in classification. The redundant and overlapping features are removed and word vectors are created by using TF-IDF weighted average FastText approach. A 623-dimensional data model is obtained combining all the obtained features, and the same is then fed to the Light Gradient Boosting Machine for classification. At last, the hyperparameters are tuned to attain optimized log_loss. The experimental results show that the proposed framework can achieve 81.47% accuracy which is at par with the other state-of-art models.
引用
收藏
页码:40167 / 40180
页数:14
相关论文
共 50 条
  • [41] A novel feature selection algorithm for text classification based on TFIDF-weight and KL-divergence
    Wang, BY
    Zhang, SM
    Proceedings of the 11th Joint International Computer Conference, 2005, : 438 - 441
  • [42] OBOE: an Explainable Text Classification Framework
    Escobar, Raul A. del aguila
    Suarez-Figueroa, Mari Carmen
    Fernandez-Lopez, Mariano
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022,
  • [43] OBOE: an Explainable Text Classification Framework
    Escobar, Raul A. del Aguila
    Suarez-Figueroa, Mari Carmen
    Fernandez-Lopez, Mariano
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2024, 8 (06):
  • [44] Introducing Semantics in Short Text Classification
    Bouaziz, Ameni
    Pereira, Celia da Costa
    Dartigues-Pallez, Christel
    Precioso, Frederic
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 433 - 445
  • [45] Review of short-text classification
    Alsmadi, Issa
    Gan, Keng Hoon
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2019, 15 (02) : 155 - 182
  • [46] Review of Chinese Short Text Classification
    Wu, Fenlin
    Gou, Jin
    Wang, Cheng
    INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS II, PTS 1-3, 2013, 336-338 : 2171 - +
  • [47] Experiments on Malay Short Text Classification
    Tiun, Sabrina
    PROCEEDINGS OF THE 2017 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI'17), 2017,
  • [48] Text Classification of Flu-related Tweets Using FastText with Sentiment and Keyword Features
    Alessa, Ali
    Faezipour, Miad
    Alhassan, Zakhriya
    2018 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2018, : 366 - 367
  • [49] A Methodological Framework for Dictionary and Rule-based Text Classification
    Abel, Jennifer
    Lantow, Birger
    KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, : 330 - 337
  • [50] A Proposed Deep Learning based Framework for Arabic Text Classification
    Sayed, Mostafa
    Abdelkader, Hatem
    Khedr, Ayman E.
    Salem, Rashed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 305 - 313