Online Unstructured Data Analysis Models with KoBERT and Word2vec: A Study on Sentiment Analysis of Public Opinion in Korean

被引:2
|
作者
Baek, Changwon [1 ]
Kang, Jiho [2 ]
Choi, Sangsoo [1 ]
机构
[1] Korea Inst Sci & Technol KIST, Technol Convergence Ctr, Seoul, South Korea
[2] Korea Univ, Inst Engn Res, Seoul, South Korea
关键词
KoBERT; Word2vec; Public opinion analysis; Sentiment classification; INTERNET;
D O I
10.5391/IJFIS.2023.23.3.244
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Online news articles and comments play a vital role in shaping public opinion. Numerous studies have conducted online opinion analyses using these as raw data. Bidirectional encoder representations from transformer (BERT)-based sentiment analysis of public opinion have recently attracted significant attention. However, owing to its limited linguistic versatility and low accuracy in domains with insufficient learning data, the application of BERT to Korean is challenging. Conventional public opinion analysis focuses on term frequency; hence, low-frequency words are likely to be excluded because their importance is underestimated. This study aimed to address these issues and facilitate the analysis of public opinion regarding Korean news articles and comments. We propose a method for analyzing public opinion using word2vec to increase the word-frequency-centered analytical limit in conjunction with KoBERT, which is optimized for Korean language by improving BERT. Naver news articles and comments were analyzed using a sentiment classification model developed on the KoBERT framework. The experiment demonstrated a sentiment classification accuracy of over 90%. Thus, it yields faster and more precise results than conventional methods. Words with a low frequency of occurrence, but high relevance, can be identified using word2vec.
引用
收藏
页码:244 / 258
页数:15
相关论文
共 50 条
  • [21] Reliability study of stock index forecasting in volatile and trending cities using public sentiment --based on word2Vec and LSTM models
    Ma, Yuanyuan
    Liu, Chenglong
    Zhang, Jie Tian
    Liu, Yanze
    APPLIED ECONOMICS, 2023, 55 (43) : 5013 - 5032
  • [22] Improving the Polarity of Text through word2vec Embedding for Primary Classical Arabic Sentiment Analysis
    Nour Elhouda Aoumeur
    Zhiyong Li
    Eissa M. Alshari
    Neural Processing Letters, 2023, 55 : 2249 - 2264
  • [23] Weighted aspect based sentiment analysis using extended OWA operators and Word2Vec for tourism
    Ghosal, Sayani
    Jain, Amita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (12) : 18353 - 18380
  • [24] Improving the Polarity of Text through word2vec Embedding for Primary Classical Arabic Sentiment Analysis
    Aoumeur, Nour Elhouda
    Li, Zhiyong
    Alshari, Eissa M. M.
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2249 - 2264
  • [25] Weighted aspect based sentiment analysis using extended OWA operators and Word2Vec for tourism
    Sayani Ghosal
    Amita Jain
    Multimedia Tools and Applications, 2023, 82 : 18353 - 18380
  • [26] Similarity Analysis of Law Documents Based on Word2vec
    Xia, Chunyu
    He, Tieke
    Li, Wenlong
    Qin, Zemin
    Zou, Zhipeng
    2019 COMPANION OF THE 19TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS-C 2019), 2019, : 354 - 357
  • [27] Analysis of the causes of inferiority feelings based on social media data with Word2Vec
    Liu, Yu
    Xu, Chen
    Kuai, Xi
    Deng, Hao
    Wang, Kaifeng
    Luo, Qinyao
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [28] Analysis of the causes of inferiority feelings based on social media data with Word2Vec
    Yu Liu
    Chen Xu
    Xi Kuai
    Hao Deng
    Kaifeng Wang
    Qinyao Luo
    Scientific Reports, 12
  • [29] A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques
    Rakshit P.
    Sarkar A.
    Multimedia Tools and Applications, 2025, 84 (2) : 979 - 1012
  • [30] Analysis of the Word2Vec Model for Semantic Similarities in Indonesian Words
    Manalu, Louisten Novandi T.
    Bijaksana, Moch Arif
    Suryani, Arie Ardiyanti
    2019 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2019, : 363 - 367