An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset

被引:3
|
作者
Alagarsamy, Sandhya [1 ]
James, Visumathi [2 ]
Raja, Raja Soosaimarian Peter [3 ]
机构
[1] Sathyabama Inst Sci & Technol, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
[2] Veltech Rangarajan Dr Sagunthala R&D Inst Sci & T, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
[3] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore, Tamil Nadu, India
关键词
HybridWord Embedding; Natural Language Processing; Deep Neural Network; Text Classification; CNN;
D O I
10.1590/1678-4324-2022210830
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Today, a wealth of data is being produced over the internet from multiple sources, giving rise to the term big data. Much big data is contributed largely in the form of text. This work focuses on text classification of movie reviews dataset using Hybrid Word Embedding (HWE) models and deriving the optimal text classification model. However, in text processing, efficient handling and processing of the words and sentences in a document plays a vital role. In traditional methods like Bag of words (BoW) semantic correlation among the words does not exist. Further, the words in a document are not always processed in order, which results in certain words not being processed at all and creating problems with data sparsity. To overcome the data sparsity problem, the proposed work applied hybrid word embedding using WordNet repository. The hybrid model is built with three word embedding methods, namely, an embedding layer, Word2Vec and GloVe, in combination with the deep learning Convolutional Neural Network (CNN). The results obtained for the movie review dataset set was compared and the optimal classification model is identified. Various metrics considered for evaluation includes Log loss, Area under Curve (AUC), Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG), Mean Absolute Error (MAE), Error Rate (ERR), Mathews Correlation Coefficient (MCC), Training Accuracy, Test Accuracy, Precision, Recall and F1 score. Finally, the experimental results proved that the word2vec is derived as the optimal hybrid word embedding model for classification of chosen movie review dataset.
引用
收藏
页数:15
相关论文
共 47 条
  • [1] A text sentiment classification model using double word embedding methods
    Mingqiang Zhou
    Dan Liu
    Yanhui Zheng
    Qingsheng Zhu
    Ping Guo
    Multimedia Tools and Applications, 2022, 81 : 18993 - 19012
  • [2] A text sentiment classification model using double word embedding methods
    Zhou, Mingqiang
    Liu, Dan
    Zheng, Yanhui
    Zhu, Qingsheng
    Guo, Ping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (14) : 18993 - 19012
  • [3] Word embedding and text classification based on deep learning methods
    Li, Saihan
    Gong, Bing
    2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
  • [4] A Review of Techniques to Determine the Optimal Word Score in Text Classification
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    Choudhary, Nilam
    AMBIENT COMMUNICATIONS AND COMPUTER SYSTEMS, RACCCS 2017, 2018, 696 : 497 - 507
  • [5] Sentiment Analysis using Novel Distributed Word Embedding for Movie Reviews
    Dhanani, Jenish
    Mehta, Rupa
    Rana, Dipti
    Tidke, Bharat
    2018 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2018, : 138 - 145
  • [6] Correlation analysis and text classification of chemical accident cases based on word embedding
    Jing, Sifeng
    Liu, Xiwei
    Gong, Xiaoyan
    Tang, Ying
    Xiong, Gang
    Liu, Sheng
    Xiang, Shuguang
    Bi, Rongshan
    PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2022, 158 (698-710) : 698 - 710
  • [7] An analysis of hierarchical text classification using word embeddings
    Stein, Roger Alan
    Jaques, Patricia A.
    Valiati, Joao Francisco
    INFORMATION SCIENCES, 2019, 471 : 216 - 232
  • [8] Text Classification Using Word Embedding in Rule-Based Methodologies: A Systematic Mapping
    Aubaid, Asmaa M.
    Mishra, Alok
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (04): : 902 - 914
  • [9] Analysis of Sentiment on Movie Reviews Using Word Embedding Self-Attentive LSTM
    Sivakumar, Soubraylu
    Rajalakshmi, Ratnavel
    INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2021, 12 (02) : 33 - 52
  • [10] Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification
    Zhang, Kaiqiang
    Wang, Shupeng
    Li, Binbin
    Mei, Feng
    Zhang, Jianyu
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 234 - 246