Power Law for Text Categorization

被引:0
|
作者
Liu, Wuying [1 ]
Wang, Lin [2 ]
Yi, Mianzhu [1 ]
机构
[1] PLA Univ Foreign Languages, Luoyang 471003, Henan, Peoples R China
[2] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China
关键词
Text Categorization; Power Law; Online Binary TC; Batch Multi-Category TC; TREC;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization (TC) is a challenging issue, and the corresponding algorithms can be used in many applications. This paper addresses the online multi-category TC problem abstracted from the applications of online binary TC and batch multi-category TC. Most applications are concerned about the space-time performance of TC algorithms. Through the investigation of the token frequency distribution in an email collection and a Chinese web document collection, this paper re-examines the power law and proposes a random sampling ensemble Bayesian (RSEB) TC algorithm. Supported by a token level memory to store labeled documents, the RSEB algorithm uses a text retrieval approach to solve text categorization problems. The experimental results show that the RSEB algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements both in the TREC email spam filtering task and the Chinese web document classifying task.
引用
收藏
页码:131 / 143
页数:13
相关论文
共 50 条
  • [21] Research of Text Categorization on WEKA
    Li Dan
    Liu Lihua
    Zhang Zhaoxin
    2013 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM DESIGN AND ENGINEERING APPLICATIONS (ISDEA), 2013, : 1129 - 1131
  • [22] Document indexing in text categorization
    Zhang, QR
    Zhang, L
    Dong, SB
    Tan, JH
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3792 - 3796
  • [23] Keyword extraction for text categorization
    An, JY
    Chen, YPP
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON ACTIVE MEDIA TECHNOLOGY (AMT 2005), 2005, : 556 - 561
  • [24] Exploiting hierarchy in text categorization
    Weigend A.S.
    Wiener E.D.
    Pedersen J.O.
    Information Retrieval, 1999, 1 (3): : 193 - 216
  • [25] Text Categorization by Weighted Features
    Fu, Junfeng
    Liang, Liang
    Zheng, Jinkun
    Zhou, Xin
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 544 - 547
  • [26] Distributional Features for Text Categorization
    Xue, Xiao-Bing
    Zhou, Zhi-Hua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (03) : 428 - 442
  • [27] Deep Encrypted Text Categorization
    Vinayakumar, R.
    Soman, K. P.
    Poornachandran, Prabaharan
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 364 - 370
  • [28] Text categorization: past and present
    Ankita Dhar
    Himadri Mukherjee
    Niladri Sekhar Dash
    Kaushik Roy
    Artificial Intelligence Review, 2021, 54 : 3007 - 3054
  • [29] Contextual entropy and text categorization
    Garcia, Moises
    Hidalgo, Hugo
    Chavez, Edgar
    LA-WEB 06: FOURTH LATIN AMERICAN WEB CONGRESS, PROCEEDINGS, 2006, : 147 - +
  • [30] Using SVMs for text categorization
    Dumais, S
    IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1998, 13 (04): : 21 - 23