A Categorization Algorithm for Harmful Text Information Filtering

被引:0
|
作者
Du, Juan [1 ]
Yi, Zhi An [1 ]
机构
[1] Northeast Petr Univ, Software Coll, Da Qing, Peoples R China
关键词
Small sample pattern recognition; Virtual sample; Harmful information filtering; Network information security;
D O I
10.1109/MINES.2012.13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Harmful text information filtering is a typical pattern recognition problem of small sample, the prediction result of classifier was biased towards the class with more samples, because of the samples that including the harmful information were difficult to gain. Construct virtual samples is an effective means to solve the problem of pattern recognition in the small sample, using the up-sampling method to construct virtual samples in the data layer, the traditional KNN algorithm has been improved: a small sample set is divided into clusters by using the K-means clustering, the virtual samples are generated and verified the validity in the cluster. The experimental results show that this method can construct the virtual samples which are similar to the real sample characteristics, and expand the small sample collection in order to effectively identify the harmful text information.
引用
收藏
页码:31 / 34
页数:4
相关论文
共 50 条
  • [31] An efficient text categorization algorithm based on category memberships
    Deng, ZH
    Tang, SW
    Zhang, M
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 1, PROCEEDINGS, 2005, 3613 : 374 - 382
  • [32] Item Categorization Algorithm Based on Improved Text Representation
    Zhenchao, Tu
    Jing, Ma
    [J]. Data Analysis and Knowledge Discovery, 2022, 6 (05) : 34 - 43
  • [33] KEY PROBLEMS IN CATEGORIZATION OF CONTRACT TEXT BASED ON INFORMATION
    Yavorska, I. Y.
    [J]. ACTUAL PROBLEMS OF ECONOMICS, 2009, (100): : 283 - 288
  • [34] A Comprehensive Analysis of using Semantic Information in Text Categorization
    Celik, Kerem
    Gungor, Tunga
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (IEEE INISTA), 2013,
  • [35] Towards automatic and optimal Filtering Levels for Feature Selection in Text Categorization
    Montañés, E
    Combarro, EF
    Díaz, I
    Ranilla, J
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, 2005, 3646 : 239 - 248
  • [36] Text Categorization and Information Retrieval Using WordNet Senses
    Rosso, Paolo
    Ferretti, Edgardo
    Jimenez, Daniel
    Vidal, Vicente
    [J]. GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 299 - 304
  • [37] Gaussian Process Based Text Categorization for Healthy Information
    Chen, Sih-Huei
    Lee, Yuan-Shan
    Tai, Tzu-Chiang
    Wang, Jia-Ching
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2015, : 30 - 33
  • [38] Text categorization using the learning vector quantization algorithm
    Martín-Valdivia, MT
    García-Vega, M
    García-Cumbreras, MA
    López, LAU
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2004, : 341 - 348
  • [39] A Method of Text Categorization Based on Genetic Algorithm and LDA
    Chen, Lei
    Li, Jun
    Zhang, Li
    [J]. PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 10866 - 10870
  • [40] Oscillating feature subset search algorithm for text categorization
    Novovicova, Jana
    Somol, Petr
    Pudil, Pavel
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2006, 4225 : 578 - 587