Take full advantage of unlabeled data for sentiment classification

被引:1
|
作者
La, Lei [1 ]
Cao, Shuyan [2 ]
Qin, Liangjuan [1 ]
机构
[1] Univ Int Business & Econ, Sch Informat Technol & Management, Beijing, Peoples R China
[2] Univ Int Business & Econ, Dept Informat Management, Beijing, Peoples R China
关键词
Social networks; Boosting; Semi-supervised learning; Sentiment mining; Unlabeled data; FRAMEWORK;
D O I
10.1108/K-08-2016-0196
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose As a foundational issue of social mining, sentiment classification suffered from a lack of unlabeled data. To enhance accuracy of classification with few labeled data, many semi-supervised algorithms had been proposed. These algorithms improved the classification performance when the labeled data are insufficient. However, precision and efficiency are difficult to be ensured at the same time in many semi-supervised methods. This paper aims to present a novel method for using unlabeled data in a more accurate and more efficient way. Design/methodology/approach First, the authors designed a boosting-based method for unlabeled data selection. The improved boosting-based method can choose unlabeled data which have the same distribution with the labeled data. The authors then proposed a novel strategy which can combine weak classifiers into strong classifiers that are more rational. Finally, a semi-supervised sentiment classification algorithm is given. Findings Experimental results demonstrate that the novel algorithm can achieve really high accuracy with low time consumption. It is helpful for achieving high-performance social network-related applications. Research limitations/implications The novel method needs a small labeled data set for semi-supervised learning. Maybe someday the authors can improve it to an unsupervised method. Practical implications The mentioned method can be used in text mining, image classification, audio processing and so on, and also in an unstructured data mining-related field. Overcome the problem of insufficient labeled data and achieve high precision using fewer computational time. Social implications Sentiment mining has wide applications in public opinion management, public security, market analysis, social network and related fields. Sentiment classification is the basis of sentiment mining. Originality/value According to what the authors have been informed, it is the first time transfer learning be introduced to AdaBoost for semi-supervised learning. Moreover, the improved AdaBoost uses a totally new mechanism for weighting.
引用
收藏
页码:474 / 486
页数:13
相关论文
共 50 条
  • [1] SSentiA: A Self-supervised Sentiment Analyzer for classification from unlabeled data
    Sazzed, Salim
    Jayarathna, Sampath
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2021, 4
  • [2] BINARY CLASSIFICATION ONLY FROM UNLABELED DATA BY ITERATIVE UNLABELED-UNLABELED CLASSIFICATION
    Kaji, Hirotaka
    Sugiyama, Masashi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3527 - 3531
  • [3] Sentiment Analysis using Unlabeled Email data
    Ali, Rayan Salah Hag
    El Gayar, Neamat
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND KNOWLEDGE ECONOMY (ICCIKE' 2019), 2019, : 329 - 334
  • [4] Pretraining Sentiment Classifiers with Unlabeled Dialog Data
    Shimizu, Toru
    Kobayashi, Hayato
    Shimizu, Nobuyuki
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 764 - 770
  • [6] Take full advantage of our natural resource
    不详
    [J]. PULP & PAPER-CANADA, 2012, 113 (06) : 4 - 4
  • [7] Exploiting Unlabeled Data for Question Classification
    Tomas, David
    Giuliano, Claudio
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 137 - 144
  • [8] Take full advantage of liquid-liquid extraction
    [J]. Chemical Engineering (New York), 1991, 98 (02):
  • [9] Automatic webpage classification enhanced by unlabeled data
    Park, SB
    Zhang, BT
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 821 - 825
  • [10] Directorate members plan to take full advantage of 'proving ground'
    Anon
    [J]. Newspaper Techniques, 2002, (JUL.):