A Categorization Algorithm for Harmful Text Information Filtering

被引:0
|
作者
Du, Juan [1 ]
Yi, Zhi An [1 ]
机构
[1] Northeast Petr Univ, Software Coll, Da Qing, Peoples R China
关键词
Small sample pattern recognition; Virtual sample; Harmful information filtering; Network information security;
D O I
10.1109/MINES.2012.13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Harmful text information filtering is a typical pattern recognition problem of small sample, the prediction result of classifier was biased towards the class with more samples, because of the samples that including the harmful information were difficult to gain. Construct virtual samples is an effective means to solve the problem of pattern recognition in the small sample, using the up-sampling method to construct virtual samples in the data layer, the traditional KNN algorithm has been improved: a small sample set is divided into clusters by using the K-means clustering, the virtual samples are generated and verified the validity in the cluster. The experimental results show that this method can construct the virtual samples which are similar to the real sample characteristics, and expand the small sample collection in order to effectively identify the harmful text information.
引用
收藏
页码:31 / 34
页数:4
相关论文
共 50 条
  • [1] A New KNN Categorization Algorithm for Harmful Information Filtering
    Du, Juan
    Yi, Zhi An
    [J]. 2012 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2012), VOL 1, 2012, : 489 - 492
  • [2] Text categorization in an intelligent agent for filtering information on the Web
    Gentili, GL
    Marinilli, M
    Micarelli, A
    Sciarrone, F
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2001, 15 (03) : 527 - 549
  • [3] The Improvement Research of Mutual Information Algorithm for Text Categorization
    Kai, Lu
    Li, Chen
    [J]. KNOWLEDGE ENGINEERING AND MANAGEMENT , ISKE 2013, 2014, 278 : 225 - 232
  • [4] An Information Filtering Algorithm Based on Text and Complexion Detecting
    Xiong, JianYing
    Yao, LeiYue
    [J]. 2008 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 308 - +
  • [5] An algorithm for text categorization with SVM
    Hu, J
    Huang, HK
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 47 - 50
  • [6] Feasibility research of text information filtering based on genetic algorithm
    Zhu, Zhenfang
    Liu, Peiyu
    [J]. SCIENTIFIC RESEARCH AND ESSAYS, 2010, 5 (22): : 3405 - 3410
  • [7] An enhanced text categorization method based on improved text frequency approach and mutual information algorithm
    Pei Zhili
    Shi Xiaohu
    Marchese, Maurizio
    Liang Yanchun
    [J]. PROGRESS IN NATURAL SCIENCE-MATERIALS INTERNATIONAL, 2007, 17 (12) : 1494 - 1500
  • [8] An enhanced text categorization method based on improved text frequency approach and mutual information algorithm
    Maurizio Marchese
    [J]. Progress in Natural Science:Materials International, 2007, (12) : 1494 - 1500
  • [9] A simple KNN algorithm for text categorization
    Soucy, P
    Mineau, GW
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 647 - 648
  • [10] A fast KNN algorithm for text categorization
    Wang, Yu
    Wang, Zheng-Ou
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3436 - +