A Categorization Algorithm for Harmful Text Information Filtering

被引:0
|
作者
Du, Juan [1 ]
Yi, Zhi An [1 ]
机构
[1] Northeast Petr Univ, Software Coll, Da Qing, Peoples R China
关键词
Small sample pattern recognition; Virtual sample; Harmful information filtering; Network information security;
D O I
10.1109/MINES.2012.13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Harmful text information filtering is a typical pattern recognition problem of small sample, the prediction result of classifier was biased towards the class with more samples, because of the samples that including the harmful information were difficult to gain. Construct virtual samples is an effective means to solve the problem of pattern recognition in the small sample, using the up-sampling method to construct virtual samples in the data layer, the traditional KNN algorithm has been improved: a small sample set is divided into clusters by using the K-means clustering, the virtual samples are generated and verified the validity in the cluster. The experimental results show that this method can construct the virtual samples which are similar to the real sample characteristics, and expand the small sample collection in order to effectively identify the harmful text information.
引用
收藏
页码:31 / 34
页数:4
相关论文
共 50 条
  • [21] An improved text categorization algorithm based on VSM
    Geng, Ji
    Lu, Yunling
    Chen, Wei
    Qin, Zhiguang
    [J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 1701 - 1706
  • [22] Intelligence text categorization based on Bayes algorithm
    Yu, F
    An, JY
    Li, H
    Zhu, ML
    Yang, OY
    [J]. ICIA 2004: Proceedings of 2004 International Conference on Information Acquisition, 2004, : 347 - 350
  • [23] Algorithm of Text Categorization based on Cloud Computing
    Huang, Liqin
    Lin, Liqun
    Liu, Yanhuang
    [J]. INFORMATION, COMMUNICATION AND ENGINEERING, 2013, 311 : 158 - +
  • [24] Information retrieval and text categorization with semantic indexing
    Rosso, P
    Molina, A
    Pla, F
    Jiménez, D
    Vidal, V
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 596 - 600
  • [25] An optimal Text categorization algorithm based on SVM
    Wang, Ziqiang
    Sun, Xia
    Zhang, Dexian
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1-4: VOL 1: SIGNAL PROCESSING, 2006, : 2137 - +
  • [26] Answer filtering via text categorization in question answering systems
    Moschitti, A
    [J]. 15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, : 241 - 248
  • [27] A Novel Feature Weight Algorithm for Text Categorization
    Shang, Wenqian
    Dong, Hongbin
    Zhu, Haibin
    Wang, Yongbin
    [J]. IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 269 - 275
  • [28] Text Categorization using Rocchio Algorithm and Random Forest Algorithm
    Selvi, Thamarai S.
    Karthikeyan, P.
    Vincent, A.
    Abinaya, V
    Neeraja, G.
    Deepika, R.
    [J]. 2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 7 - 12
  • [29] Threshold adjusting algorithm for text filtering
    Xia, Yingju
    Huang, Xuanjing
    Hu, Tian
    Wu, Lide
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2002, 15 (04): : 443 - 447
  • [30] Improving performance of text categorization by combining filtering and support vector machines
    Díaz, I
    Ranilla, J
    Montañes, E
    Fernández, J
    Combarro, EF
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (07): : 579 - 592