A Study of the Chinese spam Classification with Doc2vec and CNN

被引:0
|
作者
Gong, Hechen [1 ]
You, Fucheng [1 ]
Wang, Shaomei [1 ]
机构
[1] Beijing Inst Graph Commun, Sch Informat Engn, Beijing 102600, Peoples R China
关键词
D O I
10.1088/1757-899X/563/4/042026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolution neural network is a kind of neural network, which has been proved to be very effective in image recognition and classification. In recent years, convolution neural networks have gradually shifted to the field of natural language processing and become one of the research hotspots. For the construction of word vector text using convolution neural network, only considering the relationship between word granularity level, not considering the relationship between words, nor considering the relationship between semantics, affecting the classification results. In this paper, a method based on Doc2vec and CNN is proposed to classify spam. Firstly, the spam is preprocessed, then the sentence vectors and word vectors of Chinese text are trained by Doc2vec, and finally the trained text vectors are classified by convolution neural network.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] An Approach to Estimating Cited Sentences in Academic Papers Using Doc2vec
    Tanabe, Shunsuke
    Ohta, Manabu
    Takasu, Atsuhiro
    Adachi, Jun
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES'18), 2018, : 118 - 125
  • [22] Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
    Chen Q.
    Sokolova M.
    [J]. SN Computer Science, 2021, 2 (5)
  • [23] Using Collaborative Filtering Algorithms Combined with Doc2Vec for Movie Recommendation
    Liu, Gaojun
    Wu, Xingyu
    [J]. PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 1461 - 1464
  • [24] Distance Metrics in Open-Set Classification of Text Documents by Local Outlier Factor and Doc2Vec
    Walkowiak, Tomasz
    Datko, Szymon
    Maciejewski, Henryk
    [J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE, 2019, 11606 : 102 - 109
  • [25] Key word extraction for short text via word2vec, doc2vec, and textrank
    Li, Jun
    Huang, Guimin
    Fan, Chunli
    Sun, Zhenglin
    Zhu, Hongtao
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (03) : 1794 - 1805
  • [26] Sentiment analysis via Doc2Vec and Convolutional Neural Network hybrids
    Dhariyal, Bhaskar
    Ravi, Vadlamani
    Ravi, Kumar
    [J]. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 666 - 671
  • [27] Sentiment Analysis on Twitter data with Semi-Supervised Doc2Vec
    Bilgin, Metin
    Senturk, Izzet Fatih
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 661 - 666
  • [28] A doc2vec and local outlier factor approach to measuring the novelty of patents
    Jeon, Daeseong
    Ahn, Joon Mo
    Kim, Juram
    Lee, Changyong
    [J]. TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2022, 174
  • [29] Recommendation method for academic journal submission based on doc2vec and XGBoost
    Huang ZhengWei
    Min JinTao
    Yang YanNi
    Huang Jin
    Tian Ye
    [J]. Scientometrics, 2022, 127 : 2381 - 2394
  • [30] Filtering Malicious Java']JavaScript Code with Doc2Vec on an Imbalanced Dataset
    Mimura, Mamoru
    Suga, Yuya
    [J]. 2019 14TH ASIA JOINT CONFERENCE ON INFORMATION SECURITY (ASIAJCIS 2019), 2019, : 24 - 31