Online Unstructured Data Analysis Models with KoBERT and Word2vec: A Study on Sentiment Analysis of Public Opinion in Korean

被引:2
|
作者
Baek, Changwon [1 ]
Kang, Jiho [2 ]
Choi, Sangsoo [1 ]
机构
[1] Korea Inst Sci & Technol KIST, Technol Convergence Ctr, Seoul, South Korea
[2] Korea Univ, Inst Engn Res, Seoul, South Korea
关键词
KoBERT; Word2vec; Public opinion analysis; Sentiment classification; INTERNET;
D O I
10.5391/IJFIS.2023.23.3.244
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Online news articles and comments play a vital role in shaping public opinion. Numerous studies have conducted online opinion analyses using these as raw data. Bidirectional encoder representations from transformer (BERT)-based sentiment analysis of public opinion have recently attracted significant attention. However, owing to its limited linguistic versatility and low accuracy in domains with insufficient learning data, the application of BERT to Korean is challenging. Conventional public opinion analysis focuses on term frequency; hence, low-frequency words are likely to be excluded because their importance is underestimated. This study aimed to address these issues and facilitate the analysis of public opinion regarding Korean news articles and comments. We propose a method for analyzing public opinion using word2vec to increase the word-frequency-centered analytical limit in conjunction with KoBERT, which is optimized for Korean language by improving BERT. Naver news articles and comments were analyzed using a sentiment classification model developed on the KoBERT framework. The experiment demonstrated a sentiment classification accuracy of over 90%. Thus, it yields faster and more precise results than conventional methods. Words with a low frequency of occurrence, but high relevance, can be identified using word2vec.
引用
收藏
页码:244 / 258
页数:15
相关论文
共 50 条
  • [41] Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
    Chen Q.
    Sokolova M.
    SN Computer Science, 2021, 2 (5)
  • [42] An Efficient Method for Document Categorization Based on Word2vec and Latent Semantic Analysis
    Ju, Ronghui
    Zhou, Pan
    Li, Cheng Hua
    Liu, Lijun
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 2280 - 2287
  • [43] A deep learning analysis on question classification task using Word2vec representations
    Yilmaz, Seyhmus
    Toklu, Sinan
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07): : 2909 - 2928
  • [44] A deep learning analysis on question classification task using Word2vec representations
    Seyhmus Yilmaz
    Sinan Toklu
    Neural Computing and Applications, 2020, 32 : 2909 - 2928
  • [45] Sentiment Analysis using Word2vec-CNN-BiLSTM Classification
    Yue, Wang
    Li, Lei
    2020 SEVENTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORK ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2020, : 35 - 39
  • [46] Fusion of the word2vec word embedding model and cluster analysis for the communication of music intangible cultural heritage
    Hui Ning
    Zhenyu Chen
    Scientific Reports, 13
  • [47] Fusion of the word2vec word embedding model and cluster analysis for the communication of music intangible cultural heritage
    Ning, Hui
    Chen, Zhenyu
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [48] Microblog Emotional Analysis Based on TF-IWF Weighted Word2vec Model
    Tian, Hao
    Wu, Liuai
    PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 893 - 896
  • [49] Characterization of citizens using word2vec and latent topic analysis in a large set of tweets
    Vargas-Calderon, Vladimir
    Camargo, Jorge E.
    CITIES, 2019, 92 : 187 - 196
  • [50] Personal Trait Analysis Using Word2vec Based on User-generated Text
    Sun, Guanqun
    Guo, Ao
    Ma, Jianhua
    Wei, Jianguo
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1131 - 1137