Application of Knowledge Gain on Multi-Type Feature Space in Microblog User Classification

被引:0
|
作者
Yan, Xu [1 ,2 ]
机构
[1] Beijing Language & Culture Univ, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
关键词
knowledge gain; feature selection; text classification; user classification; microblog;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Feature selection plays an important role in text categorization. Classic feature selection methods such as document frequency (DF), information gain (IG), mutual information (MI) are commonly applied in text categorization. But usually they only take plain text into account. Knowledge Gain (KG) is a new feature selection method which is proposed in my previous paper. It measures attribute's importance based on Rough Set theory. Experiment shows that it performs well in traditional text classification, and it has obvious advantage in unbalanced corpus in recall rate. Unlike traditional text classification, characteristics of microblog reflected in short text and special structure networks, including user social network and behavior network. This results in less text information and more behavior and social information of microblog users. The classic feature selection algorithms, which are proposed based on text feature, is not applicable. In this paper, we validated that KG which is proposed based on the rough set knowledge can select optimal feature consistently in multi-type feature space of microblog user classification. Experiment shows that it has better performance in multi-type feature selection than other classic feature selection methods.
引用
下载
收藏
页码:340 / 345
页数:6
相关论文
共 50 条
  • [31] MixSleepNet: A Multi-Type Convolution Combined Sleep Stage Classification Model
    Ji, Xiaopeng
    Li, Yan
    Wen, Peng
    Barua, Prabal
    Acharya, U. Rajendra
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 244
  • [32] CNN BASED METHOD FOR MULTI-TYPE DISEASED ARECANUT IMAGE CLASSIFICATION
    Mallikarjuna, S. B.
    Shivakumara, Palaiahnakote
    Khare, Vijeta
    Kumar, N. Vinay
    Basavanna, M.
    Pal, Umapada
    Poornima, B.
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2021, 34 (03) : 255 - 265
  • [33] Cumulative damage for multi-type epidemics and an application to infectious diseases
    Fierro, Raul
    JOURNAL OF MATHEMATICAL BIOLOGY, 2023, 86 (03)
  • [34] Extraction of Driving Behavior Primitives Based on Multi-type Variables Space
    Li X.-S.
    Gui X.-T.
    Zheng X.-L.
    Ren Y.-Y.
    Shi L.
    Xi J.-F.
    Zhongguo Gonglu Xuebao/China Journal of Highway and Transport, 2022, 36 (07): : 223 - 235
  • [35] Coping Strategy for Multi-Joint Multi-Type Asynchronous Failure of a Space Manipulator
    Jia, Qingxuan
    Wang, Xuan
    Chen, Gang
    Yuan, Bonan
    Fu, Yingzhuo
    IEEE ACCESS, 2018, 6 : 40337 - 40353
  • [36] Reliability Assessment of Space Station Based on Multi-Layer and Multi-Type Risks
    Li, Xiaopeng
    Li, Fuqiu
    APPLIED SCIENCES-BASEL, 2021, 11 (21):
  • [37] Multi-Type Feature Extraction and Early Fusion Framework for SMS Spam Detection
    Al-Kabbi, Hussein Alaa
    Feizi-Derakhshi, Mohammad-Reza
    Pashazadeh, Saeid
    IEEE ACCESS, 2023, 11 : 123756 - 123765
  • [38] Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification
    Hirsch, Vitali
    Reimann, Peter
    Treder-Tschechlov, Dennis
    Schwarz, Holger
    Mitschang, Bernhard
    VLDB JOURNAL, 2023, 32 (05): : 1037 - 1064
  • [39] Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification
    Vitali Hirsch
    Peter Reimann
    Dennis Treder-Tschechlov
    Holger Schwarz
    Bernhard Mitschang
    The VLDB Journal, 2023, 32 : 1037 - 1064
  • [40] Incorporating Multi-Type External Information for Document-Level Sentiment Classification
    Liu, Pengyuan
    Zhu, Chenghao
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 253 - 258