Feature Selection Algorithm for Hierarchical Text Classification Using Kullback-Leibler Divergence

被引:0
|
作者
Yao Lifang [1 ]
Qin Sijun [2 ]
Zhu Huan [2 ]
机构
[1] CUEB, Sch Stat, Beijing, Peoples R China
[2] CUC, New Media Inst, Beijing, Peoples R China
关键词
hierarchical text classification; KL divergence; text classification; hierarchical feature selection; category correlation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification, a simple and effective method, is considered as the key technology to deal with and organize a large amount of text data. At present, the simple text classification is unable to meet the increasing of user's demand, hierarchical text classification has received extensive attention and has broad application prospects. Hierarchical feature selection algorithm is the key technology of hierarchical text automatic classification, and the general method mainly aims at the individual feature selection of each class in the class hierarchy, and ignores the correlation between the parent and child class. This paper proposes a feature selection method based on KL divergence, measure the correlation between the class and subclasses by the KL divergence, calculate the correlation between each feature and sub class by Mutual Information method, measure the importance of subclasses characteristics using Term Frequency probability, to select the better discrimination set of features for parent class node. In this paper, we used hierarchical feature selection method and SVM classifiers for the hierarchical text categorization task on two corpora. Experiments showed the algorithm we proposed was effective, compared with the chi 2 statistic (CHI), information gain (IG), and mutual information (MI) that were used directly to select hierarchical feature.
引用
收藏
页码:421 / 424
页数:4
相关论文
共 50 条
  • [1] Modulation Classification Based on Kullback-Leibler Divergence
    Im, Chaewon
    Ahn, Seongjin
    Yoon, Dongweon
    [J]. 15TH INTERNATIONAL CONFERENCE ON ADVANCED TRENDS IN RADIOELECTRONICS, TELECOMMUNICATIONS AND COMPUTER ENGINEERING (TCSET - 2020), 2020, : 373 - 376
  • [2] Correcting the Kullback-Leibler distance for feature selection
    Coetzee, FM
    [J]. PATTERN RECOGNITION LETTERS, 2005, 26 (11) : 1675 - 1683
  • [3] Renyi Divergence and Kullback-Leibler Divergence
    van Erven, Tim
    Harremoes, Peter
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2014, 60 (07) : 3797 - 3820
  • [4] The fractional Kullback-Leibler divergence
    Alexopoulos, A.
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2021, 54 (07)
  • [5] BOUNDS FOR KULLBACK-LEIBLER DIVERGENCE
    Popescu, Pantelimon G.
    Dragomir, Sever S.
    Slusanschi, Emil I.
    Stanasila, Octavian N.
    [J]. ELECTRONIC JOURNAL OF DIFFERENTIAL EQUATIONS, 2016,
  • [6] On the Interventional Kullback-Leibler Divergence
    Wildberger, Jonas
    Guo, Siyuan
    Bhattacharyya, Arnab
    Schoelkopf, Bernhard
    [J]. CONFERENCE ON CAUSAL LEARNING AND REASONING, VOL 213, 2023, 213 : 328 - 349
  • [7] Kullback-Leibler Divergence Revisited
    Raiber, Fiana
    Kurland, Oren
    [J]. ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 117 - 124
  • [8] Markov-switching model selection using Kullback-Leibler divergence
    Smith, Aaron
    Naik, Prasad A.
    Tsai, Chih-Ling
    [J]. JOURNAL OF ECONOMETRICS, 2006, 134 (02) : 553 - 577
  • [9] NMF Algorithm Based on Extended Kullback-Leibler Divergence
    Gao, Liuyang
    Tian, Yinghua
    Lv, Pinpin
    Dong, Peng
    [J]. PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 1804 - 1808
  • [10] An Improved FastSLAM2.0 Algorithm Using Kullback-Leibler Divergence
    Wen Shiguang
    Wu Chengdong
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 225 - 228