Automatic Chinese Text Classification Using N-Gram Model

被引:0
|
作者
Yen, Show-Jane [1 ]
Lee, Yue-Shi [1 ]
Wu, Yu-Chieh [1 ]
Ying, Jia-Ching [2 ]
Tseng, Vincent S. [2 ]
机构
[1] Ming Chuan Univ, Dept Comp Sci & Informat Engn, 5 Ming Rd, Tao Yuan 333, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
关键词
Text Classification; N-gram; Feature Selection; Word Segmentation; Logistic Regression; CATEGORIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic Chinese text classification is an important and well-known research topic in the field of information retrieval and natural language processing. However, past researches often ignore the problem of word segmentation and the relationship between words. This paper proposes an N-gram-based language model for Chinese text classification which considers the relationship between words. To prevent from the out-of-vocabulary problem, a novel smoothing method based on logistic regression is also proposed to improve the performance. The experimental result shows that our approach outperforms the previous N-gram-based classification model above 11% on micro-average F-measure.
引用
收藏
页码:458 / +
页数:3
相关论文
共 50 条
  • [1] Chinese Text Categorization Using the Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    [J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2012), 2012, : 722 - 726
  • [2] Short Text Classification Based on Feature Extension Using The N-Gram Model
    Zhang, Xinwei
    Wu, Bin
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716
  • [3] Are n-gram Categories Helpful in Text Classification?
    Kruczek, Jakub
    Kruczek, Paulina
    Kuta, Marcin
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 524 - 537
  • [4] A Neural N-Gram Network for Text Classification
    Yan, Zhenguo
    Wu, Yue
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2018, 22 (03) : 380 - 386
  • [5] An ensemble text classification model combining strong rules and N-Gram
    Liu, Jinhong
    Lu, Yuliang
    [J]. ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 3, PROCEEDINGS, 2007, : 535 - +
  • [6] Classification of facemarks using N-gram
    Yamada, Thichi
    Tsuchiya, Seiji
    Kuroiwa, Shiongo
    Ren, Fuji
    [J]. PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 322 - +
  • [7] Text Classification using Gated Fusion of n-gram Features and Semantic Features
    Nagar, Ajay
    Bhasin, Anmol
    Mathur, Gaurav
    [J]. COMPUTACION Y SISTEMAS, 2019, 23 (03): : 1015 - 1020
  • [8] Classification of Text Documents based on Naive Bayes using N-Gram Features
    Baygin, Mehmet
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [9] N-Gram Pattern Recognition using Multivariate-Bernoulli Model with Smoothing Methods for Text Classification
    Kilimci, Zeynep Hilal
    Akyokus, Selim
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 597 - 600
  • [10] n-BiLSTM: BiLSTM with n-gram Features for Text Classification
    Zhang, Yunxiang
    Rao, Zhuyi
    [J]. PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 1056 - 1059