A Study on Text Classification: Term Weighting Algorithm Analysis

被引:0
|
作者
Tseng, Kuan-Hua [1 ]
Lin, Chun-Hung Richard [1 ]
Liu, Jain-Shing [2 ]
Huang, Chih-Ming Andrew [1 ]
Wang, Yue-Han [1 ]
机构
[1] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung, Taiwan
[2] Providence Univ, Dept Comp Sci & Informat Engn, Taichung, Taiwan
来源
JOURNAL OF INTERNET TECHNOLOGY | 2021年 / 22卷 / 02期
关键词
Text classification; Term weighting; Supervised term weighting; CATEGORIZATION;
D O I
10.3966/160792642021032202007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advancement of digital recording and storing technology, plus the huge growth of world wide web, people nowadays use digital texts instead of paper to write and record. In order to realize more text applications, the technology of text classification is gradually gaining attention recently. To achieve automatic text classification through machine learning, the related five technologies, including pre-processing, feature extraction, feature selection, term weighting and classification algorithm, are often discussed as well by many researches. In this paper, we are going to explore the impact of term weighting on text classification. Term weighting is definitely a very important part of text classification. The calculated weight should directly reflect the importance of the term in entire text to allow machine learning to achieve the best classified result. We applied some common term weighting methods to several pre-defined datasets and conducted the experiments. Instead of intuitively considering that the value of weight represents how important it is, it turned out that the result shows the term actually may not as important as the high scored weight represents.
引用
收藏
页码:311 / 325
页数:15
相关论文
共 50 条
  • [1] A Comparative Study on Term Weighting Schemes for Text Classification
    Mazyad, Ahmad
    Teytaud, Fabien
    Fonlupt, Cyril
    [J]. MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 100 - 108
  • [2] A Text Classification Algorithm based on Feature Weighting
    Yang, Han
    Cui, Honggang
    Tang, Hao
    [J]. GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [3] Adaptable Term Weighting Framework for Text Classification
    Huynh, Dat
    Dat Tran
    Ma, Wanli
    Sharma, Dharmendra
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 254 - 265
  • [4] A survey of term weighting schemes for text classification
    Alsaeedi, Abdullah
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2020, 12 (02) : 237 - 254
  • [5] Imbalanced text classification: A term weighting approach
    Liu, Ying
    Loh, Han Tong
    Sun, Aixin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 690 - 701
  • [6] An improved method of term weighting for text classification
    Jiang, Hua
    Li, Ping
    Hu, Xin
    Wang, Shuyan
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 294 - 298
  • [7] An improved term weighting scheme for text classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [8] A Study of Term Weighting Schemes Using Class Information for Text Classification
    Ko, Youngjoong
    [J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1029 - 1030
  • [9] A Study of Applying Different Term Weighting Schemes on Arabic Text Classification
    Guru, D. S.
    Ali, Mostafa
    Suhil, Mahamad
    Hazman, Maryam
    [J]. DATA ANALYTICS AND LEARNING, 2019, 43 : 293 - 305
  • [10] A Term Weighting Scheme Approach for Vietnamese Text Classification
    Vu Thanh Nguyen
    Nguyen Tri Hai
    Nguyen Hoang Nghia
    Tuan Dinh Le
    [J]. FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 46 - 53