Tibetan Text Classification Based on the Feature of Position Weight

被引:7
|
作者
Cao, Hui [1 ]
Jia, Huiqiang [1 ]
机构
[1] Northwest Univ Nationalities, Chinese Natl Inst Informat Technol, Lanzhou 730030, Gansu, Peoples R China
关键词
Position weight; Tibetan; Text classification; Support Vector Machine; Feature words;
D O I
10.1109/IALP.2013.63
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Based on the study of Tibetan characters and grammar, this paper has done research on Tibetan in the text categorization weight algorithm based on the vector space model. Comprehensively considering the position information of Tibetan which presented in the text, the paper has proposed an improved TF-IDF weighting algorithm. In the process, it has adopted chi(2) (CHI) statistical methods for features on the Tibetan word document extraction and used the cosine method in Tibetan text similarity calculation to distinguish between similar documents in Tibetan. The Tibetan text classification algorithm with linear separable support vector machine classification of Tibetan texts; and finally compared the TF-IDF algorithm with the improved TF-IDF algorithm in the effects of the Tibetan text classification. Finally, it shows that the improved TF-IDF algorithm has better classification effect.
引用
收藏
页码:220 / 223
页数:4
相关论文
共 50 条
  • [1] A Novel Feature Selection Based on Tibetan Grammar for Tibetan Text Classification
    Jiang, Tao
    Yu, Hongzhi
    [J]. PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 445 - 448
  • [2] A Feature Weight Algorithm for Text Classification Based on Class Information
    Li Yong-fei
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 930 - 932
  • [3] Tibetan text Sentiment Classification Based on Rules
    Huang, Tao
    Yan, Xiaodong
    [J]. PROCEEDINGS 2015 18TH INTERNATIONAL CONFERENCE ON NETWORK-BASED INFORMATION SYSTEMS (NBIS 2015), 2015, : 566 - 569
  • [4] A Text Classification Model Based on Training Sample Selection and Feature Weight Adjustement
    Pang, Xuezeng
    Liao, Yixing
    [J]. 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 3, 2010, : 294 - 297
  • [5] A method of the feature selection in hierarchical text classification based on the category discrimination and position information
    Song, Jia
    Zhang, Pengzhou
    Qin, Sijun
    Gong, Junpeng
    [J]. 2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2015, : 132 - +
  • [6] Tibetan text classification based on word vector features
    Ma, Wei
    Yu, Hongzhi
    Ma, Jing
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 76 - 77
  • [7] Improved Feature Weight Algorithm and Its Application to Text Classification
    Shang, Songtao
    Shi, Minyong
    Shang, Wenqian
    Hong, Zhiguo
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [8] Feature Extraction based Text Classification: A review
    Shaker, Saif Safaa
    Alhajim, Dhafer
    Al-Khazaali, Ahmed Ali Talib
    Hussein, Hussein Aqeel
    Athab, Ali F.
    [J]. JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (01) : 646 - 653
  • [9] A Text Classification Algorithm based on Feature Weighting
    Yang, Han
    Cui, Honggang
    Tang, Hao
    [J]. GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [10] Tibetan News Text Classification Based on Graph Convolutional Networks
    Xu G.
    Zhang Z.
    Yu S.
    Dong Y.
    Tian Y.
    [J]. Data Analysis and Knowledge Discovery, 2023, 7 (06) : 73 - 85