A Method of Text Categorization Based on Genetic Algorithm and LDA

被引:0
|
作者
Chen, Lei [1 ]
Li, Jun [1 ]
Zhang, Li [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Anhui, Peoples R China
关键词
LDA; Genetic Algorithm; feature selection; text categorization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation(LDA) does not consider the input feature selection. The topic of each word is allocated by LDA in original feature space, which contains many insignificant words and affects quality of topics. In this paper, we proposed a feature selection method based on Genetic Algorithm(GA), which reduces the dimension of LDA input features and makes the generated topic more meaningful. Experimental results on corpus of Fudan University show that micro-average F1 and macro-average F1 is improved by 0.76% and 0.72%, compared with original LDA. The method also reduces the training time of model due to removal of many insignificant words. According to experiment results, GA feature selection is superior to statistical methods such as document frequency and information gain. Besides, it's adaptive and does not need to determine proportion of feature selection. Thus, LDA model based on GA feature selection has better categorization performance .
引用
收藏
页码:10866 / 10870
页数:5
相关论文
共 50 条
  • [1] Feature Weighting Method Based on Real-coded Genetic Algorithm in Text Categorization
    Li, Junwei
    Li, Xiangqian
    [J]. 2015 8TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2015, : 91 - 94
  • [2] LDA-based Keyword Selection in Text Categorization
    Tasci, Serafettin
    Gungor, Tunga
    [J]. 2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 229 - 234
  • [3] Research on text categorization model based on LDA - KNN
    Chen, Weihua
    Zhang, Xian
    [J]. 2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 2719 - 2726
  • [4] Smoothing LDA model for text categorization
    Li, Wenbo
    Sun, Le
    Feng, Yuanyong
    Zhang, Dakun
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 83 - +
  • [5] CLDA: Feature selection for text categorization based on constrained LDA
    Cui Zifeng
    Xu Baowen
    Zhang Weifeng
    Jiang Dawei
    Xu Junling
    [J]. ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 702 - +
  • [6] Multi-class text categorization based on LDA and SVM
    Li, Kunlun
    Xie, Jing
    Sun, Xue
    Ma, Yinghui
    Bai, Hui
    [J]. CEIS 2011, 2011, 15
  • [7] Hybrid feature selection based on enhanced genetic algorithm for text categorization
    Ghareb, Abdullah Saeed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 49 : 31 - 47
  • [8] An Efficient Document Categorization Algorithm Based on LDA and SFL
    Sun, Xia
    Wang, Ziqiang
    [J]. ISBIM: 2008 INTERNATIONAL SEMINAR ON BUSINESS AND INFORMATION MANAGEMENT, VOL 2, 2009, : 112 - 115
  • [9] A BP Neural Network Text Categorization Method Optimized by an Improved Genetic Algorithm
    Xia, Rongze
    Jia, Yati
    Li, Hu
    [J]. 2013 NINTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2013, : 257 - 261
  • [10] A KNN BASED ALGORITHM FOR TEXT CATEGORIZATION
    Bucar, Joze
    Povh, Janez
    [J]. SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 367 - 372