Imbalanced text sentiment classification using universal and domain-specific knowledge

被引:63
|
作者
Li, Yijing [1 ,2 ]
Guo, Haixiang [1 ,2 ,3 ]
Zhang, Qingpeng [4 ]
Gu, Mingyun [1 ,2 ]
Yang, Jianying [5 ]
机构
[1] China Univ Geosci, Coll Econ & Management, Wuhan 430074, Hubei, Peoples R China
[2] China Univ Geosci, Res Ctr Digital Business Management, Wuhan 430074, Hubei, Peoples R China
[3] China Univ Geosci, Mineral Resource Strategy & Policy Res Ctr, Wuhan 430074, Hubei, Peoples R China
[4] City Univ Hong Kong, Dept Syst Engn & Engn Management, Kowloon, Hong Kong, Peoples R China
[5] Wuhan Ctr China Geol Survey, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Sentiment analysis; Label propagation; Imbalanced data; Ensemble learning; LEXICON; MODEL;
D O I
10.1016/j.knosys.2018.06.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a sentiment classification model is proposed to address two predominant issues in sentiment classification, namely domain-sensitive and data imbalance. Since words may embed distinct sentiment polarities in different contexts, sentiment classification is widely contended as a domain-sensitive task. Accordingly, this paper draws on label propagation to induce universal and domain-specific sentiment lexicons and builds a domain-adaptive sentiment classification model that incorporates universal and domain-specific knowledge into a unified learning framework. On the flip side, sentiment-related corpuses are usually formed with skewed polarity distribution because individuals tend to share similar assessment criteria on a given object and hence their sentiment polarities toward the same object are likely to be similar. We endeavor to address such imbalanced data problem by advancing a novel over-sampling technique. Unlike existing over-sampling approaches that generate minority-class samples from numerical feature space, the proposed sampling method directly creates synthetic texts from word spaces. Several experiments are conducted to verify the effectiveness of the proposed lexicon generation method, learning framework, and over-sampling method. Results show that the induced sentiment lexicons are interpretable and the proposed model is found to be effective for imbalanced and domain-specific text sentiment classification.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [11] Mining ontological knowledge from domain-specific text documents
    Jiang, X
    Tan, AH
    [J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 665 - 668
  • [12] Domain-specific term extraction and its application in text classification
    Liu, T
    Wang, XL
    Yi, G
    Xu, ZM
    Wang, Q
    [J]. Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1481 - 1484
  • [13] A Classification Method of Knowledge Cards in Japanese and Chinese by Using Domain-Specific Dictionary
    Liu, Xiaopeng
    Cai, Li
    Akiyoshi, Masanori
    Komoda, Norihisa
    [J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2012, 151 : 453 - +
  • [14] Sentiment Analysis for Domain-Specific Texts
    Yanagimoto, Hidekazu
    Yoshioka, Michifumi
    [J]. PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 17TH '12), 2012, : 791 - 794
  • [15] A domain-specific decision support system for knowledge discovery using association and text mining
    Dnyanesh Rajpathak
    Rahul Chougule
    Pulak Bandyopadhyay
    [J]. Knowledge and Information Systems, 2012, 31 : 405 - 432
  • [16] A domain-specific decision support system for knowledge discovery using association and text mining
    Rajpathak, Dnyanesh
    Chougule, Rahul
    Bandyopadhyay, Pulak
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (03) : 405 - 432
  • [17] Multiclass Sentiment Classification of Online Health Forums using Both Domain-independent and Domain-specific Features
    Alnashwan, Rana
    Sorensen, Humphrey
    O'Riordan, Adrian
    Hoare, Cathal
    [J]. BDCAT'17: PROCEEDINGS OF THE FOURTH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, 2017, : 75 - 83
  • [18] Domain-Specific Image Classification Using Ensemble Learning Utilizing Open-Domain Knowledge
    Sun, Han
    Yang, Jian
    [J]. 2019 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2019, : 592 - 596
  • [19] Domain-specific text dictionaries for text analytics
    Andrea Villanes
    Christopher G. Healey
    [J]. International Journal of Data Science and Analytics, 2023, 15 : 105 - 118
  • [20] Domain-specific text dictionaries for text analytics
    Villanes, Andrea
    Healey, Christopher G.
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 15 (01) : 105 - 118